Open przemeklal opened 10 months ago
This is an alert rule for which is hard to be correctly opinionated about: see https://cloud.google.com/compute/docs/disks/performance#pd-ssd.
We'll think on how to change that or maybe remove the check entirely.
The same comment goes for the HostHighDiskReadRate
alert rule.
@lucabello I believe you're right and it's a good idea to remove these checks since it's impossible to find a reasonable, universal threshold. These alert rules can always be added using the cos-configuration charm on clusters where this may be important.
Also this alert rule is generated for LXD containers which make 0 sense, so +1 for removal of the check. Or move threshold definition to charm configuration...
Bug Description
This threshold seems to be too low, especially for NVME drives used as bcache on busy Ceph clusters. Write rates around 50MB/s are pretty normal, the default threshold should be closer to at least 100 MB/s. Also,
for: 5m
seems to be too aggressive in production, I'd suggest increasing this to at least 20m.We see a lot of flapping, and false positives and the alert itself is not actionable.
To Reproduce
Deploy grafana-agent and start writing data >50MB/s :)
Environment
grafana-agent rev 29 on focal
Relevant log output
Additional context
No response