databrickslabs / dataframe-rules-engine

Extensible Rules Engine for custom Dataframe / Dataset validation
Other
134 stars 30 forks source link

Feature - % failure threshold #15

Open GeekSheikh opened 3 years ago

GeekSheikh commented 3 years ago

Given the rule, some anomalous data could be allowed.

Assume a DF with 100 million records, does 1 record out of bounds by 0.0001 % fail the entire test? User should be empowered to decide acceptable failure rates as a function of total records failed AND, for some rule types (i.e. bounds), a percentage miss. Given a bounds rule, if the boundary is Bounds(0.0, 1.0) and the data point result is 1.001, a rounding error can easily cause this but it could fail the rule and in some cases should not.