awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.32k stars 539 forks source link

New API added to referential integrity to allow for row level annotation #466

Closed rdsharma26 closed 1 year ago

rdsharma26 commented 1 year ago

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

eycho-am commented 1 year ago

Would it be possible to also add tests in the VerificationSuiteTests to see how the row level results for ref integrity are added with other row level results?

rdsharma26 commented 1 year ago

Would it be possible to also add tests in the VerificationSuiteTests to see how the row level results for ref integrity are added with other row level results?

This is a standalone utility and we are not using it in the VerificationSuite. Once the verification suite supports multiple dataframes, we will add this check and integrate the row level results.