Nike-Inc / spark-expectations

A Python Library to support running data quality rules while the spark job is running⚡
https://engineering.nike.com/spark-expectations
Apache License 2.0
161 stars 37 forks source link

[BUG] The input_count is less than the output_count in the stats table #63

Open DevathiNNikhil opened 9 months ago

DevathiNNikhil commented 9 months ago

Describe the bug The values that are observed in the output_count are sometimes more than the value of input_count. This case was happened only few times for the many times the execution has happened

To Reproduce The checks performed on the table are row_dq and to be specific the rules are null_validation and uniqueness check and the rules was defined as it is defined in documentation. All the rules has been passed successfully with zero error rate but out of my experimentation on many tables it specifically happened on 2 tables

Expected behavior Expected that input_counts to be equal to output_counts

Screenshots image001

The spark expectations version that has been used is 0.8.1

asingamaneni commented 4 months ago

@DevathiNNikhil Can you please try with the latest version of the codebase and let us know if you still have the issue!