Metrics pg_exporter_last_scrape_error returns always 1 on replica instances

dominik0711 commented 1 year ago

I have set up a crunchy Postgres cluster on my OpenShift cluster with the postgres-operator v5.3.0. The Postgres cluster has 1 master and 2 replicas. According to your monitoring examples, I have added alert-rules like the ones in this repo, see: https://github.com/CrunchyData/postgres-operator-examples/blob/main/kustomize/monitoring/alertmanager-rules-config.yaml

The alert PGExporterScrapeError (pg_exporter_last_scrape_error > 0) is always firing on my replicas. All the replicas return 1 and only the master returns 0. There are many more, e.g.: PGReplicationSlotsInactive etc. (replication slots on replicas are always inactive as far as I know)

Are all the alert rules in this repo intended to run only against the master instance and not against the replicas? If so. I do not see any filter by instance role in these alert-rules?

Thank you very much for the clarification.

andrewlecuyer commented 8 months ago

hi @dominik0711, sorry to hear you are having trouble! Are you still running into this issue?

We have updated the monitoring stack in recent versions of Crunchy Postgres for Kubernetes, and I would be curious to see if you are still seeing the behavior you described with the latest patches and updates.

As for the issue you described (assuming you are still have any trouble), does the exporter container show any errors? If so, can you provide the logs?

dsessler7 commented 4 months ago

As we have not heard back, I am closing this issue. If you are still seeing issues with the latest version of CPK, feel free to re-open the issue or submit a new issue and provide the requested logs.

CrunchyData / postgres-operator-examples

Metrics pg_exporter_last_scrape_error returns always 1 on replica instances #237