A Spark-based data comparison tool at scale which facilitates software development engineers to compare a plethora of pair combinations of possible data sources. Multiple execution modes in multiple environments enable the user to generate a diff report as a Java/Scala-friendly DataFrame or as a file for future use. Comes with out of the box SparkFactory and SparkCompare tools.
Added parallelizeCSVSource to load CSV data into DataFrames for comparison. Fixed typo in CountResult. Added enum for CSV source type.