awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.18k stars 519 forks source link

Column Count Analyzer and Check #555

Closed mentekid closed 2 months ago

mentekid commented 3 months ago

Issue #, if available:

Description of changes:

This adds a dataset-scope analyzer that counts the number of columns in a dataset, and uses the analyzer in a Deequ Check

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.