CoxAutomotiveDataSolutions / waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Apache License 2.0
75 stars 16 forks source link

Feature/config driven extension #88

Closed alexjbush closed 5 years ago

alexjbush commented 5 years ago

Description

This PR introduces configuration-driven extension functionality.

This type of extension adds a pre-execution hook when an extension is enabled by setting spark.waimak.dataflow.extensions=${extensionKey},otherextension.

Instances of the extension trait must be registered services in the META-INF/services file as they are loaded using ServiceLoader.

It also provides an implementation of a Cache As Parquet extension:

//Enable extension
spark.conf.set("spark.waimak.dataflow.extensions", cacheasparquet)
//Set labels to cache
spark.conf.set("spark.waimak.dataflow.extensions.cacheasparquet.cacheLabels", "purchases")
//Or cache all
spark.conf.set("spark.waimak.dataflow.extensions.cacheasparquet.cacheAll", true)

To allow this to work, the configuration utils (i.e. CaseClassConfigParser) have now been moved into core.

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Unit tests

codecov-io commented 5 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (develop@7d148f3). Click here to learn what that means. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop      #88   +/-   ##
==========================================
  Coverage           ?   82.62%           
==========================================
  Files              ?       60           
  Lines              ?     1658           
  Branches           ?       71           
==========================================
  Hits               ?     1370           
  Misses             ?      288           
  Partials           ?        0
Impacted Files Coverage Δ
...a/waimak/configuration/CaseClassConfigParser.scala 97.26% <ø> (ø)
...ration/PropertiesFilePropertyProviderBuilder.scala 100% <ø> (ø)
...w/spark/CacheAsParquetConfigurationExtension.scala 100% <100%> (ø)
...ala/com/coxautodata/waimak/dataflow/DataFlow.scala 96.61% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 7d148f3...5bce087. Read the comment docs.