Open miktros opened 5 months ago
Hi @miktros
Indeed, when we added the detection_period
and training_period
params we translated them under the hood to "days_back" and "backfill_days", making "hour" unit not be supported.
The solution for this is to change the operation under the hood in the dbt package and make the queries really use hours for the training and detection.
However, in your PR you added an "hours_back" option to the monitor CLI so I'm not sure I understand how it solves that same problem?
If you could shed some light on it I would be happy to review.
Thanks!
By splitting data into 1-hour time buckets in volume.anomalies tests, we were hoping to be able to run dbt test
and then edr monitor
every hour so that alert is emitted only for a test with row count in the most recent 1-hour time bucket that is lower than past 21 day average (time bucket failure). We use the "hours-back" option, edr monitor --hours-back 1
, to achieve the alerting behavior as described above. With the "days-back" option, alert is emitted each time edr monitor
is run when there is a time bucket failure any time during the day, even though the most recent time bucket has no failure. In short, we want per hour alert notification that reflects failure condition of the most recent hour.
So I'm not sure this behavior really works, because when the failed test happened in the last hour an alert will be fired, even if the failed metric isn't from the last hour, because you have a detection period of 1 day - so every failure in the past 1 day will fail the test, and if it happens in the hour prior to running edr monitor
an alert will be fired.
A different workaround I can suggest is to use the alert_suppression_interval flag in your CLI - setting it to 24 for example will only alert on the same issue once every 24 hours, preventing you from getting duplicate alerts on the same issue. Does this help?
If not-
If I understand correctly, what you want to do it open a PR in the dbt-package repo, changing the way the detection_period
param is handled in the test queries.
In here You can see the param 'detection_period' is being translated to 'backfill_days' which is being then piped into the rest of the macros handling the test query, but actually handling it and translating it to the chosen unit and setting the query accordingly is the required change to achieve the behavior you need.
I understand this is not the easiest contribution to do, and at some point we will probably do it ourselves. But I can't commit to a timeline at the moment unfortunately.
Is your feature request related to a problem? Please describe.
Documentation for
volume_anomalies
listshour
as an option for configuringdetection_period
. However, configuringdetection_period
usinghour
results in compilation error:Missing mandatory configuration: ['backfill_days']
Describe the solution you'd like
Elementary tests like
volume_anomalies
test allow configuringtime_bucket
by the hour. I would like to be able to configure thedetection_period
using thehour
option so that I can arrange for test runs such that anomaly alerts are emitted based on test failures of comparing the row count of the most recent hourly time bucket ofdetection_period
against row count of time buckets for the lasttraining_period
days.Describe alternatives you've considered
Introduce a new CLI option
--hours-back
foredr monitor
to optionally set a number-of-hours limit to how far back shouldedr monitor
look for pending alerts. If provided, it overrides--days-back
.I have a POC implementation that seems to work. PR to add optional
--hours-back
foredr monitor
here.Additional context
None.
Would you be willing to contribute this feature?
I am open to contributing to this feature and would appreciate any guidance you can provide.