When two Spark dataframes have all matching columns but zero matching rows, the report method of SparkCompare throws an exception. Below is the an example piece of code and the result.
****** Column Summary ******
Number of columns in common with matching schemas: 3
Number of columns in common with schema differences: 0
Number of columns in base but not compare: 0
Number of columns in compare but not base: 0
****** Row Summary ******
Number of rows in common: 0
Number of rows in base but not compare: 2
Number of rows in compare but not base: 2
Number of duplicate rows found in base: 0
Number of duplicate rows found in compare: 0
****** Row Comparison ******
Number of rows with some columns unequal: 0
Number of rows with all columns equal: 0
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
----> 1 comparison_report.report()
/python3.8/site-packages/datacompy/spark.py in report(self, file)
889 self._merge_dataframes()
890 self._print_num_of_rows_with_column_equality(file)
--> 891 self._print_row_matches_by_column(file)
/lib/python3.8/site-packages/datacompy/spark.py in _print_row_matches_by_column(self, myfile)
723 if self.columns_match_dict[key][MatchType.MISMATCH.value]
724 }
--> 725 columns_fully_matching = {
726 key: self.columns_match_dict[key]
727 for key in self.columns_match_dict
/lib/python3.8/site-packages/datacompy/spark.py in <dictcomp>(.0)
726 key: self.columns_match_dict[key]
727 for key in self.columns_match_dict
--> 728 if sum(self.columns_match_dict[key])
When two Spark dataframes have all matching columns but zero matching rows, the
report
method ofSparkCompare
throws an exception. Below is the an example piece of code and the result.Source Code:
Result: