Nike-Inc / spark-expectations

A Python Library to support running data quality rules while the spark job is running⚡
https://engineering.nike.com/spark-expectations
Apache License 2.0
161 stars 37 forks source link

rewrote generate_summarised_row_dq_res #66

Closed IMC07 closed 8 months ago

IMC07 commented 8 months ago

Description

This PR solves this issue: https://github.com/Nike-Inc/spark-expectations/issues/62

Motivation and Context

See issue.

How Has This Been Tested?

The tests that were already in place still succeed. Next to that, I tested data quality checks that I already had in place on my data. This resulted in error_counts that were the same as the sum of the failed_row_counts in the row_dq_res_summary. This was not the case for me for the same scenario's with the current implementation of spark-expectations. Therefore, the bug is solved by using this implementation.

Types of changes

Checklist:

asingamaneni commented 8 months ago

@IMC07 looks like the unittests failed, can you please review this

codecov[bot] commented 8 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (1c87e21) 100.00% compared to head (330f0b6) 100.00%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #66 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 22 22 Lines 1447 1453 +6 ========================================= + Hits 1447 1453 +6 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.