Closed sed-i closed 2 years ago
There are two "always firing" rules in avalanche - one that is based on absent, and one that is based on metric value:
absent
There are a few surprising things about the alert that is coming from the rule with absent:
instance
juju_unit
git clone https://github.com/canonical/cos-lite-bundle
cd cos-lite-bundle
tox -e integration -- --keep-models
Model Controller Cloud/Region Version SLA Timestamp test-bundle-kjp4 newstuff microk8s/localhost 2.9.25 unsupported 00:12:21Z App Version Status Scale Charm Store Channel Rev OS Address Message alertmanager active 1 alertmanager-k8s charmhub edge 10 kubernetes 10.152.183.144 avalanche active 2 avalanche-k8s charmhub edge 15 kubernetes 10.152.183.140 grafana active 1 grafana-k8s charmhub edge 29 kubernetes 10.152.183.213 loki active 1 loki-k8s charmhub edge 15 kubernetes 10.152.183.231 prometheus active 1 prometheus-k8s charmhub edge 20 kubernetes 10.152.183.20 Unit Workload Agent Address Ports Message alertmanager/0* active idle 10.1.179.223 avalanche/0* active idle 10.1.179.224 avalanche/1 active idle 10.1.179.226 grafana/0* active idle 10.1.179.228 loki/0* active idle 10.1.179.230 prometheus/0* active idle 10.1.179.227 Offer Application Charm Rev Connected Endpoint Interface Role alertmanager-karma-dashboard alertmanager alertmanager-k8s 10 0/0 karma-dashboard karma_dashboard provider grafana-dashboards grafana grafana-k8s 29 0/0 grafana-dashboard grafana_dashboard requirer loki-logging loki loki-k8s 15 0/0 logging loki_push_api provider prometheus-scrape prometheus prometheus-k8s 20 0/0 metrics-endpoint prometheus_scrape requirer Relation provider Requirer Interface Type Message alertmanager:alerting prometheus:alertmanager alertmanager_dispatch regular alertmanager:replicas alertmanager:replicas alertmanager_replica peer avalanche:metrics-endpoint prometheus:metrics-endpoint prometheus_scrape regular avalanche:replicas avalanche:replicas avalanche_replica peer grafana:grafana grafana:grafana grafana_peers peer loki:grafana-source grafana:grafana-source grafana_datasource regular prometheus:grafana-source grafana:grafana-source grafana_datasource regular prometheus:prometheus-peers prometheus:prometheus-peers prometheus_peers peer
See "Additional context".
Note how the annotations have empty strings where instance should have gone:
"description": " of job non_existing_job is firing the dummy alarm.", "summary": "Instance dummy alarm (always firing)"
Also note how juju_unit is missing from alert labels.
Relevant prometheus output:
> $ curl -s 10.1.179.227:9090/api/v1/alerts | jq
{ { [ { "labels": { "alertname": "AlwaysFiringDueToAbsentMetric", "job": "non_existing_job", "juju_application": "avalanche", "juju_charm": "avalanche-k8s", "juju_model": "test-bundle-kjp4", "juju_model_uuid": "6376405a-54dc-45b1-8eb8-b42f96d51f12", "severity": "High" }, "annotations": { "description": " of job non_existing_job is firing the dummy alarm.", "summary": "Instance dummy alarm (always firing)" }, "state": "firing", "activeAt": "2022-03-10T23:29:31.289307407Z", "value": "1e+00" } ] } }
Relevant alertmanager output:
> $ curl -s 10.1.179.223:9093/api/v2/alerts | jq
[ { "annotations": { "description": " of job non_existing_job is firing the dummy alarm.", "summary": "Instance dummy alarm (always firing)" }, "endsAt": "2022-03-10T23:51:31.289Z", "fingerprint": "bc3f0f827af3d64d", "receivers": [ { "name": "dummy" } ], "startsAt": "2022-03-10T23:29:31.289Z", "status": { "inhibitedBy": [], "silencedBy": [], "state": "active" }, "updatedAt": "2022-03-10T23:47:31.293Z", "generatorURL": "http://10.1.179.227:9090/graph?g0.expr=absent%28some_metric_name_that_shouldnt_exist%7Bjob%3D%22non_existing_job%22%7D%29&g0.tab=1", "labels": { "alertname": "AlwaysFiringDueToAbsentMetric", "job": "non_existing_job", "juju_application": "avalanche", "juju_charm": "avalanche-k8s", "juju_model": "test-bundle-kjp4", "juju_model_uuid": "6376405a-54dc-45b1-8eb8-b42f96d51f12", "severity": "High" } } ]
This is expected. A rule that is triggering due to the absence of metrics won't have any time series to get its labels from.
Bug Description
There are two "always firing" rules in avalanche - one that is based on
absent
, and one that is based on metric value:There are a few surprising things about the alert that is coming from the rule with
absent
:instance
label to it, unlike the alert coming from the other rule file, which does have it.juju_unit
label to it, unlike the alert coming from the other rule file, which does have it.To Reproduce
git clone https://github.com/canonical/cos-lite-bundle
cd cos-lite-bundle
tox -e integration -- --keep-models
Environment
Relevant log output
Additional context
Alert rendered form
absent
doesn't have aninstance
labelNote how the annotations have empty strings where
instance
should have gone:Also note how
juju_unit
is missing from alert labels.Relevant prometheus output:
Relevant alertmanager output: