Closed luis-simauchi closed 1 month ago
ten retries is going to be our warning threshold for the new alert
example query to model the new one after sum(last_10m):sum:vets_api.statsd.api_1010ezr_failed_wont_retry{env:eks-prod}.as_count() >= 1
monitor created: https://vagov.ddog-gov.com/monitors/274737
We now have separate errors for normal failures and "total/exhausted" failures. We need to update our configuration and notification messaging in DD and the APM channel to reflect the changes
The update separates the two as follows: -- api.1010ezr.failed_wont_retry -- api.1010ezr.failed
Build a new monitor based of the api.1010ezr.failed stat D call