CoxAutomotiveDataSolutions / waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Apache License 2.0
75 stars 16 forks source link

Feature/generalise commit and extensions #85

Closed alexjbush closed 5 years ago

alexjbush commented 5 years ago

Description

This PR aims to introduce a simple 'Extension' system allowing users to bring their own extensions to use with DataFlows. This 'Extension' system allows bespoke metadata to be added to the flow, and a mechanism for manipulating a flow before it is executed.

The following existing functionality has be modified to use this extension system:

I have also changed the functionality of the Cache action to:

I have also moved the commit, and spark actions into the package object.

None of the above changes should be breaking.

Fixes #54

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Unit tests

codecov-io commented 5 years ago

Codecov Report

Merging #85 into develop will increase coverage by 0.1%. The diff coverage is 83.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop      #85     +/-   ##
==========================================
+ Coverage    82.11%   82.21%   +0.1%     
==========================================
  Files           56       59      +3     
  Lines         1599     1631     +32     
  Branches        63       79     +16     
==========================================
+ Hits          1313     1341     +28     
- Misses         286      290      +4
Impacted Files Coverage Δ
...data/waimak/dataflow/spark/SparkInterceptors.scala 100% <ø> (ø) :arrow_up:
...a/waimak/dataflow/spark/ParquetDataCommitter.scala 98.24% <100%> (-0.09%) :arrow_down:
...cala/com/coxautodata/waimak/dataflow/package.scala 100% <100%> (ø) :arrow_up:
...autodata/waimak/dataflow/spark/SparkDataFlow.scala 87.32% <100%> (ø) :arrow_up:
...ala/com/coxautodata/waimak/dataflow/DataFlow.scala 96.22% <100%> (+0.63%) :arrow_up:
...om/coxautodata/waimak/dataflow/spark/package.scala 66.66% <66.66%> (ø)
...taflow/spark/CacheAsParquetMetadataExtension.scala 89.47% <89.47%> (ø)
...ata/waimak/dataflow/spark/SparkActionHelpers.scala 92.59% <92.59%> (ø)
...data/waimak/dataflow/CommitMetadataExtension.scala 94.87% <94.87%> (ø)
...todata/waimak/dataflow/PostActionInterceptor.scala 85.71% <0%> (-14.29%) :arrow_down:
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2227fa0...956e9db. Read the comment docs.