canonical / grafana-agent-operator

https://charmhub.io/grafana-agent
Apache License 2.0
4 stars 10 forks source link

feat(alerting): detect flapping alerts #148

Open simskij opened 2 months ago

simskij commented 2 months ago

Issue

We are currently not providing any alert rules that would detect when an alert is transitioning to and from pending in a fast pace, which means we could potentially loose out on information from recurring intermittent errors.

Solution

Detect whenever an alert is transitioning more than 5 times in the last hour.

Context

resolves #145

Testing Instructions

  1. Deploy COS
  2. Deploy a machine (juju deploy ubuntu)
  3. Deploy Grafana Agent and relate it to ubuntu.
  4. Relate the agent to COS
  5. Verify that the alert rule has made it over
  6. Turn off the host
  7. Wait until a scrape has occurred
  8. Turn on the host
  9. Wait until a scrape has occurred
  10. Repeat 6 to 9 5 times.
  11. Wait 5 minutes
  12. Verify that the alert is firing

Upgrade Notes

err404r commented 2 months ago

Hi, during next two day I'm going to test this rule with inhibitions rules, so please do not merge I might be able to find some useful improvements