Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
3.32k
stars
539
forks
source link
[Experimental] Addition of dataset comparison utilities #449
Closed
rdsharma26 closed 1 year ago
Description of changes:
These changes bring in two utility classes which can be used for dataset comparisons.
Original author: Fernan Gonzalez (@fergonp) as part of their internship project at Amazon.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.