Is your feature request related to a problem? Please describe.PR 552 introduced a Ratio Of Sums analyzer that checks whether two columns' values add up to the same number. We can extend this analyzer to a Ratio Of Aggregation to accept any kind of Spark aggregation, e.g. average.
Describe the solution you'd like
There should be a generic RatioOfAggregation check that accepts two columns and an aggregation function. An implementation of that would be RatioOfSums, which sets aggregation to sum.
Describe alternatives you've considered
The alternative would be to let users define Check assertions as a function of another aggregator's value. Rather than saying this:
Is your feature request related to a problem? Please describe. PR 552 introduced a Ratio Of Sums analyzer that checks whether two columns' values add up to the same number. We can extend this analyzer to a Ratio Of Aggregation to accept any kind of Spark aggregation, e.g. average.
Describe the solution you'd like There should be a generic RatioOfAggregation check that accepts two columns and an aggregation function. An implementation of that would be RatioOfSums, which sets aggregation to
sum
.Describe alternatives you've considered The alternative would be to let users define Check assertions as a function of another aggregator's value. Rather than saying this:
they could define their checks as
(this is pseudocode, but basically pass an analyzer as part of the assertion)