awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.32k stars 539 forks source link

[Experimental] Addition of dataset comparison utilities #449

Closed rdsharma26 closed 1 year ago

rdsharma26 commented 1 year ago

Description of changes:

These changes bring in two utility classes which can be used for dataset comparisons.

Original author: Fernan Gonzalez (@fergonp) as part of their internship project at Amazon.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.