Open jgillard opened 7 years ago
I've the same behaviour but I think it come from the change-threshold
https://github.com/AcalephStorage/consul-alerts#health-checks.
Indeed, your node is critical for only 20 sec so, critical state never reach threshold. But on passing, you reach 60 sec so notification is send.
I've recently created a Notification Profile to direct serfHealth alerts to email and have since been getting a lot of "System is HEALTHY" without the corresponding "System is CRITICAL" emails beforehand.
I enabled debug logging a few days ago and an excerpt is below. You can see that the node was critical at one point but never triggered an alert. When it became stable, after 90 seconds an email was sent. We've never had this problem with PagerDuty, presumedly because it wouldn't have an incident to resolve.