Closed Omgzilla closed 3 weeks ago
We should not write alert rules to persistent storage and recalculate everything everytime and this will solve itself. Refactoring job.
graph LR
grafana-agent ---|remote-write| prometheus
avalanche ---|metrics-endpoint| grafana-agent
We have two alerts from avalanche:
$ curl -s 10.1.207.168:9090/api/v1/rules | jq | grep '"av"' -C10 | grep alertname
"alertname": "AlwaysFiringDueToAbsentMetric",
"alertname": "AlwaysFiringDueToNumericValue",
Rename one:
diff --git a/src/prometheus_alert_rules/always_firing_absent.rule b/src/prometheus_alert_rules/always_firing_absent.rule
index 17f8b01..327197e 100644
--- a/src/prometheus_alert_rules/always_firing_absent.rule
+++ b/src/prometheus_alert_rules/always_firing_absent.rule
@@ -1,4 +1,4 @@
-alert: AlwaysFiringDueToAbsentMetric
+alert: AlwaysFiringDueToAbsentMetricRenamed
expr: absent(some_metric_name_that_shouldnt_exist{job="non_existing_job"})
for: 0m
labels:
pack, refresh, and the renamed one is up to date, with no duplication:
$ curl -s 10.1.207.168:9090/api/v1/rules | jq | grep '"av"' -C10 | grep alertname
"alertname": "AlwaysFiringDueToAbsentMetricRenamed",
"alertname": "AlwaysFiringDueToNumericValue",
Will try with a machine charm next.
graph LR
subgraph lxd
grafana-agent --- ubuntu
hardware-observer --- ubuntu
grafana-agent --- hardware-observer
end
subgraph k8s
prometheus
end
prometheus --- grafana-agent
We have 78 alerts from hardware observer:
$ juju ssh --container prometheus prom/0 cat /etc/prometheus/rules/juju_welcome-lxd_82889f2e_hwo.rules | grep "alert:"
78
Then rename:
diff --git a/src/prometheus_alert_rules/ipmi_sensors.yaml b/src/prometheus_alert_rules/ipmi_sensors.yaml
index b83af12..82423f9 100644
--- a/src/prometheus_alert_rules/ipmi_sensors.yaml
+++ b/src/prometheus_alert_rules/ipmi_sensors.yaml
@@ -2,7 +2,7 @@ groups:
- name: IpmiSensors
rules:
- - alert: IPMIMonitoringCommandFailed
+ - alert: IPMIMonitoringCommandFailedRenamed
expr: ipmimonitoring_command_success == 0
for: 5m
labels:
Pack and refresh, and still 78 and the "...Renamed" rule is there.
@Omgzilla I failed to reproduce this.
Would you be able to paste the output of juju export-bundle
from the lxd model if you encounter this again?
Closing for now, but please do re-open if encountered again!
Bug Description
We have just started to implement COS-lite stack into our production. When we create the charms, we add rules for grafana-agent to integrate onto prometheus, but when we change the name of a rule and refresh the application, a new rule gets created.
We were able to remove it by commented it out, push it with juju refresh, then remove the file and recreate the charm.
To Reproduce
Environment
Juju controller v.2.9.43 Clouds
Cross-Model using COS-lite and Grafana-Agent (edge)
Relevant log output
Additional context
No response