CoxAutomotiveDataSolutions / waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Apache License 2.0
75 stars 16 forks source link

Make the spark prefix configurable for Case Class config parser #75

Closed alexjbush closed 5 years ago

alexjbush commented 5 years ago

It would be nice to have the Spark prefix to be configurable in the case class config parser.

For example: Given a case class: case class Test(a: String)

And config property: spark.configgroup.project1.a=b

It would be nice to set a waimak parameter for the SparkConf part of config parsing to use a prefix when retrieving config from the SparkSession conf.

By default, it would be:

  val SPARK_CONF_PREFIX: String = s"$configParamPrefix.sparkConfPropertyPrefix"
  val SPARK_CONF_PREFIX_DEFAULT: String = "spark."

So you would parse the config like:

CaseClassConfigParser[Test](sparkFlowContext, "configgroup.project1"))

When looking in the spark conf, it will add the spark. config to the prefix. This has the benefit that when using other property providers (like Databricks Secrets), the config keys don't need to be prefixed by spark. They would be found like: configgroup.project1.a.

You could also set the prefix to be more specific:

spark.conf.set(SPARK_CONF_PREFIX, "spark.configgroup.")

So you would parse the config like:

CaseClassConfigParser[Test](sparkFlowContext, "project1"))

and it would look in external property providers for project1.a.

This would be a breaking change for any parsing as the "spark" prefix should not now be part of the parsing prefix. It would not be breaking if the default prefix was an empty string, however this wouldn't be ideal.

What do you think @vavison ?

alexjbush commented 5 years ago

Created a branch at: https://github.com/CoxAutomotiveDataSolutions/waimak/compare/feature/issue-75-spark-conf-prefix

This test shows the functionality:

    it("strip spark prefix off parameter but not off map and properties") {
      val context = SparkFlowContext(sparkSession)
      val conf: RuntimeConfig = sparkSession.conf
      conf.set("spark.arg1", "1")
      conf.set(CONFIG_PROPERTY_PROVIDER_BUILDER_MODULES, "com.coxautodata.waimak.configuration.TestPropertyProvider")
      TestPropertyProvider.props.setProperty("arg3", "3")
      TestPropertyProvider.getPropertyProvider(context).get("arg3") should be(Some("3"))
      CaseClassConfigParser[PrefixTest](context, "", Map("arg2" -> "2")) should be(PrefixTest("1", "2", "3"))
    }