canonical / cos-proxy-operator

https://charmhub.io/cos-proxy
Apache License 2.0
2 stars 12 forks source link

Prometheus Scrape Job Cleanup #135

Open vernhart opened 7 months ago

vernhart commented 7 months ago

Bug Description

We have multiple cos-proxy applications deployed so that we can have them related to different instances of the prometheus-scrape-config-k8s so that we can have different scrape intervals (specifically for the alerts). Upon noticing that we had some redundant scrape jobs, I realized we had incorrectly related nrpe to both cos-proxy applications. However, removing the relation between nrpe and one of the cos-proxy applications did not clear the redundant scrape jobs.

When I removed the relation between cos-proxy and the prometheus-scrape-config-k8s, the redundant scrape jobs went away but when I re-added the relation, they came back.

Suspecting the cached info was being stored within the cos-proxy unit, I removed the cos-proxy unit. The redundant scrape jobs went away but when I added a new cos-proxy unit, they came back.

This led me to think the cached data must be held within one of the other cos-proxy relations. One at a time, I removed the other relations:

After each removal, I checked the prometheus config and the redundant scrape jobs were still there.

Then, once again, I removed and re-added the cos-proxy/prometheus-scrape-config-k8s relationship and the redundant scrape jobs were gone.

I have to assume that removing one of the other relations successfully removed the cached data but that the config change didn't get pushed to prometheus until removing and re-adding the prometheus-scrape-config-k8s relation.

To Reproduce

Although we discovered this because we have two cos-proxy applications, this can probably be replicated with just one. I haven't yet narrowed it down to a minimal set of relations but this requires a cos deployment and then in another model, cos-proxy with nrpe. Cause the nrpe unit to be removed and the scrape jobs associated with that unit will persist in prometheus config.

Environment

juju 3.3.3 on maas 3.3.5

cos-proxy rev 71 nrpe rev 106 alertmanager-k8s rev 77 catalogue-k8s rev 19 grafana-k8s rev 82 loki-k8s rev 91 prometheus rev 129 prometheus-scrape-config-k8s rev 43 traefik-k8s rev 129

Additional context

No response

lucabello commented 4 months ago

We're probably missing a refresh event somewhere!