awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.18k stars 519 forks source link

Feature: Add Row Level Result Treatment Options for Uniqueness and Completeness #532

Closed eycho-am closed 4 months ago

eycho-am commented 4 months ago

Issue #, if available: #530

Description of changes: This PR adds the option FileteredRow to AnalyzerOptions that defines how filtered rows will be labeled as when retrieving row level results. The two options are True and Null and this defaults to True.

This PR defines the behavior for the Completeness and Uniqueness analyzers, and will be updated for other analyzers in future PRs.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.