Open Abuelodelanada opened 6 months ago
To avoid oscillations of node_filesystem_avail_bytes
, we can use a combination of node_filesystem_avail_bytes
and loki_distributor_bytes_received_total
.
For example, we'd have a 72h and 20m predictions:
# 72h prediction, using a 18h window
sum (rate(loki_distributor_bytes_received_total[18h])*60*60*72) > bool sum(node_filesystem_avail_bytes)
# 20m prediction, using a 5m window
sum (rate(loki_distributor_bytes_received_total[5m])*60*20) > bool sum(node_filesystem_avail_bytes)
Enhancement Proposal
This issue is a continuation of this issue https://github.com/canonical/loki-k8s-operator/issues/220 and this PR https://github.com/canonical/loki-k8s-operator/pull/373/
Based on what we have seen in this and this PR's Comments figure out what are the best metrics, time ranges, etc to create meaningful Alert rules and Dashboards.
About alert rules, useful post: https://www.robustperception.io/reduce-noise-from-disk-space-alerts/