cloudnative-pg / charts

CloudNativePG Helm Charts
Apache License 2.0
174 stars 82 forks source link

Wrong template variable in some prometheus cluster rules #322

Open Wain13 opened 3 months ago

Wain13 commented 3 months ago

https://github.com/cloudnative-pg/charts/blob/fd5eff94b986797100155ad4638555cda3fb5823/charts/cluster/prometheus_rules/cluster-offline.yaml#L7

https://github.com/cloudnative-pg/charts/blob/fd5eff94b986797100155ad4638555cda3fb5823/charts/cluster/prometheus_rules/cluster-ha-critical.yaml#L7

https://github.com/cloudnative-pg/charts/blob/fd5eff94b986797100155ad4638555cda3fb5823/charts/cluster/prometheus_rules/cluster-ha-warning.yaml#L7

All of the above reference the cluster name incorrectly by using {{ $labels.job }}, causing them to not expand in the file, which then render as blank values when the alert is thrown. They will expand correctly if changed to {{ .namespace }}/{{ .cluster }} in accordance with the other prom rules.

itay-grudev commented 2 months ago

That's odd, because the .labels is provided from here:

https://github.com/cloudnative-pg/charts/blob/fd5eff94b986797100155ad4638555cda3fb5823/charts/cluster/templates/prometheus-rule.yaml#L11-L29

itay-grudev commented 2 months ago

I think the problem is just with the CNPGClusterOffline query:

The count() aggregation here doesn't return any of the labels from the underlying cnpg_collector_up metric. Which is why there are no labels at the end in the alert description. The rest of the alerts are fine.