I have a batch inference pipelines. Each of these pipeline uses pydeequ to validate metrics of predictions and id. One of the checks are also anomalyCheck which make sure that lets say number of distinct values and size of dataframe does not change much from month to month.
My problem is that lets say i have a month where there were some mistakes in dataframe and size changes, i get notified by deequ, go and fix.
Now when i get to next month, since the last run was failed it will compare metric of new run to a metric of failed run.
To Reproduce
Steps to reproduce the behavior:
Go to '...'
Click on '....'
Scroll down to '....'
See error
Expected behavior
Is it somehow possible to not compare my next run to failed run. Also what else can we use RelativeRateOfChangeStrategy, what happens when we put order to something else, i dont find it clear from docs
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
DBR 12.2 LTS ML, py3.9, Spark 3.3
Additional context
Add any other context about the problem here.
Describe the bug
I have a batch inference pipelines. Each of these pipeline uses pydeequ to validate metrics of predictions and id. One of the checks are also anomalyCheck which make sure that lets say number of distinct values and size of dataframe does not change much from month to month.
My problem is that lets say i have a month where there were some mistakes in dataframe and size changes, i get notified by deequ, go and fix. Now when i get to next month, since the last run was failed it will compare metric of new run to a metric of failed run.
To Reproduce Steps to reproduce the behavior:
Expected behavior
Is it somehow possible to not compare my next run to failed run. Also what else can we use RelativeRateOfChangeStrategy, what happens when we put order to something else, i dont find it clear from docs
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
DBR 12.2 LTS ML, py3.9, Spark 3.3
Additional context Add any other context about the problem here.