A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations of possible data sources. Multiple execution modes in multiple environments enable the user to generate a diff report as a Java/Scala-friendly DataFrame or as a file for future use. Comes with out of the box SparkFactory and SparkCompare tools.
This commit contains the following sample tests: 1) Database to database transformation test. 2) File to database transformation test.
Code contains updates from @aosama. Thank you @aosama for your review of the code.