Open blysik opened 8 years ago
Nop,
Alerts are triggered service based, not check based. So if an alert has already been triggered, then Cabot will not trigger another one even if another check fails until alert_notification time is reached.
@dbuxton please fix me if I'm wrong ;-)
Le ven. 23 oct. 2015 18:53, blysik notifications@github.com a écrit :
Hi,
Just a question on what the behavior is supposed to be.
- create a service, and assign it two checks: graphite, and http.
- http check goes into error, and then an alert is triggered. (Importance of 'Error'.)
- A few moments later, the graphite check failed (importance of critical), however no alert appears to be triggered.
Shouldn't another alert be triggered for 3?
— Reply to this email directly or view it on GitHub https://github.com/arachnys/cabot/issues/281.
Wouldn't that be a problem if the first check was just a warning, and the next check was a critical?
It depends IMHO I maybe be interesting to have an elevation of failure state of a server, like defcon in war movies ;-)
but I don't think this is easy to implement, nor I think it will be widely used.
I think, as currently designed, critical errors might go unnoticed.
At the moment we just track as a timestamp the last alert sent (Service.last_alert_sent
- see https://github.com/arachnys/cabot/blob/fc33c9859a6c249f8821c88eb8506ebcad645a50/cabot/cabotapp/models.py#L180) - we don't track what kind of alert that was.
It would be easy to also track Service.last_alert_sent_overall_status
and compare that to the current to ensure that this issue doesn't occur.
Happy to merge anything that does this, I too think this is a big potential problem. However it won't silence alerts until morning @blysik, just for ALERT_INTERVAL
Aha! ALERT_INTERVAL. Okay, so not as bad.
Hi,
Just a question on what the behavior is supposed to be.
Shouldn't another alert be triggered for 3?