awslabs / python-deequ

Python API for Deequ
Apache License 2.0
713 stars 134 forks source link

Anomaly Detection #188

Closed dinjazelena closed 1 month ago

dinjazelena commented 8 months ago

Describe the bug

I have a batch inference pipelines. Each of these pipeline uses pydeequ to validate metrics of predictions and id. One of the checks are also anomalyCheck which make sure that lets say number of distinct values and size of dataframe does not change much from month to month.

My problem is that lets say i have a month where there were some mistakes in dataframe and size changes, i get notified by deequ, go and fix. Now when i get to next month, since the last run was failed it will compare metric of new run to a metric of failed run.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

Is it somehow possible to not compare my next run to failed run. Also what else can we use RelativeRateOfChangeStrategy, what happens when we put order to something else, i dont find it clear from docs

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

DBR 12.2 LTS ML, py3.9, Spark 3.3

Additional context Add any other context about the problem here.

chenliu0831 commented 8 months ago

Could you follow the bug report template https://github.com/awslabs/python-deequ/issues/new?assignees=&labels=&projects=&template=bug_report.md?