awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.32k stars 539 forks source link

Missing Column Precondition for Compliance Check - issue fix 467 #478

Closed samarth-c1 closed 1 year ago

samarth-c1 commented 1 year ago

Issue fix #467

Description of changes:

Added the additionalPreconditions for Compliance constraint so the compliance check would fail and not spark operation. Used the columns check for the precondition to fail instead of letting it skipping the same. It was difficult to get the column names from columnCondition hence used the new param to the function calls.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mentekid commented 1 year ago

Thanks for opening this PR, we'll take a look as soon as we can!

samarth-c1 commented 1 year ago

Thanks for opening this PR, we'll take a look as soon as we can!

Thank You @mentekid! I'm aware of the tests failing in the ConstraintSuggestionResultTest. Working on getting my change up with internal review to update this PR.

mentekid commented 1 year ago

It looks like the build is failing - can you take a look and address any issues?

samarth-c1 commented 1 year ago

It looks like the build is failing - can you take a look and address any issues?

Yes, some lines are exceeding the column limit. Updating it now.