FRosner / drunken-data-quality

Spark package for checking data quality
Apache License 2.0
222 stars 69 forks source link

Approximate DataFrame Equality #115

Open FRosner opened 7 years ago

FRosner commented 7 years ago

Is it possible to make more efficient comparisons of two dataframes using something like MinHash?