awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.32k stars 539 forks source link

How to use deequ with Spark SQL #424

Open eeasterly opened 2 years ago

eeasterly commented 2 years ago

Is there a simple quick start example for the usage of deequ in Spark SQL? If anyone has some suggestions or examples, that would be greatly helpful. Alternatively, it could be a feature request, as it appears there is a flavor for pySpark.

cryptopalxyz commented 2 years ago

I think this is a good point.. sometimes, it is not adequate to apply "constraints" on single table but to check "business logic" through "table joins" using Spark SQL. I think this will help a lot to expand the usage of deequ..