canonical / loki-k8s-operator

https://charmhub.io/loki-k8s
Apache License 2.0
10 stars 16 forks source link

Alert rules and Dashboard panels about log growth rate in Grafana #383

Open Abuelodelanada opened 6 months ago

Abuelodelanada commented 6 months ago

Enhancement Proposal

This issue is a continuation of this issue https://github.com/canonical/loki-k8s-operator/issues/220 and this PR https://github.com/canonical/loki-k8s-operator/pull/373/

Based on what we have seen in this and this PR's Comments figure out what are the best metrics, time ranges, etc to create meaningful Alert rules and Dashboards.

About alert rules, useful post: https://www.robustperception.io/reduce-noise-from-disk-space-alerts/

sed-i commented 6 months ago

To avoid oscillations of node_filesystem_avail_bytes, we can use a combination of node_filesystem_avail_bytes and loki_distributor_bytes_received_total.

For example, we'd have a 72h and 20m predictions:

# 72h prediction, using a 18h window
sum (rate(loki_distributor_bytes_received_total[18h])*60*60*72) > bool sum(node_filesystem_avail_bytes)
# 20m prediction, using a 5m window
sum (rate(loki_distributor_bytes_received_total[5m])*60*20) > bool sum(node_filesystem_avail_bytes)