FINRAOS / MegaSparkDiff

A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations of possible data sources. Multiple execution modes in multiple environments enable the user to generate a diff report as a Java/Scala-friendly DataFrame or as a file for future use. Comes with out of the box SparkFactory and SparkCompare tools.
https://finraos.github.io/MegaSparkDiff/
Apache License 2.0
48 stars 26 forks source link

Improve Spark Option Configuration #32

Open kalverra opened 6 years ago

kalverra commented 6 years ago

APIs that utilize spark configuration options are a tad restrictive. We need to move the solution away from overloaded methods and into a new paradigm that allows more cleaner, more configurable API calls to the user.

This can possibly be done by providing a few basic APIs as we do now, but also allowing the user to directly interact with the SparkConf and appending options directly from there.

mmlinford commented 6 years ago

I agree with all of that. Whatever we end up doing it'll be good to keep a couple of factory methods like we do now for simple / common use cases.

I had an idea that maybe we could do something like what the Chrome WebDriver does with ChromeOptions, which is extend the basic config class but add named helper methods for particular options. I've found it kind of annoying to look up specific property names, plus this would help guarantee that you're using the right one (and didn't typo).