awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.18k stars 519 forks source link

Added RatioOfSums analyzer and tests #550

Closed scott-gunn closed 3 months ago

scott-gunn commented 3 months ago

Issue #, if available:

Description of changes: This PR adds a new metric, RatioOfSums, which will aggregate the sum of 2 different columns and then divide for the final result. This could be useful for things such as a percent of total or just making sure one value isn't changing disproportionately to another.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.