awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.32k stars 539 forks source link

Fix chi-square test conditions #482

Closed bevhanno closed 1 year ago

bevhanno commented 1 year ago

Chi-square test is only be conducted when the key size of the expected dataset is greater equal the minium.

Additional tests have been added.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

bevhanno commented 1 year ago

@rdsharma26 could you please review this fix ? One edge case was not handled correctly. Thank you