PRX / Infrastructure

Templates and assets used to launch and manage many aspects of PRX's applications and services
MIT License
42 stars 11 forks source link

Fix/augury logged error alarms #669

Closed svevang closed 1 year ago

svevang commented 1 year ago

Fixes a set of alarms that don't work for a couple reasons:

This PR fixes the above, and switches the style to match the 500's alarm:

In the modified alarms here, the alarm is considered existential: if the error occurs, then set off the alarm. Re: Multiple evaluation periods, I think only in a self-healing scenario (like the actuals falling behind or forecasts slow down a bit) would we want to ignore the error metrics for a single evaluation period. These alarms imply something has gone wrong that needs outside intervention.