cloudnative-pg / charts

CloudNativePG Helm Charts
Apache License 2.0
174 stars 82 forks source link

No cnpg_* metrics in Prometheus, CloudNativePG Grafana Dashboard shows "no data" #302

Closed rwarford closed 4 months ago

rwarford commented 4 months ago

Description of the issue

I have installed the Prometheus operator and CloudNativePG operator via ArgoCD but am not getting any cnpg*_ metrics in Prometheus.

Prometheus chart: kube-prometheus-stack, revision 58.7.2 CloudNativePG chart: cloudnative-pg, revision 0.21.2

What Happens:

What I Expect

I expect the CloudNativePG Grafana dashboard to be correctly populated and I expect to be able to browse cnpg metrics in the Prometheus UI.

What I've Tried

Configuration Data

Prometheus cnpg podMonitor job configuration:

- job_name: podMonitor/cnpg-system/cnpg-cloudnative-pg/0
  honor_timestamps: true
  track_timestamps_staleness: false
  scrape_interval: 30s
  scrape_timeout: 10s
  scrape_protocols:
  - OpenMetricsText1.0.0
  - OpenMetricsText0.0.1
  - PrometheusText0.0.4
  metrics_path: /metrics
  scheme: http
  enable_compression: true
  follow_redirects: true
  enable_http2: true
  relabel_configs:
  - source_labels: [job]
    separator: ;
    regex: (.*)
    target_label: __tmp_prometheus_job_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_phase]
    separator: ;
    regex: (Failed|Succeeded)
    replacement: $1
    action: drop
  - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_instance, __meta_kubernetes_pod_labelpresent_app_kubernetes_io_instance]
    separator: ;
    regex: (cnpg);true
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name, __meta_kubernetes_pod_labelpresent_app_kubernetes_io_name]
    separator: ;
    regex: (cloudnative-pg);true
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_container_port_name]
    separator: ;
    regex: metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - separator: ;
    regex: (.*)
    target_label: job
    replacement: cnpg-system/cnpg-cloudnative-pg
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: metrics
    action: replace
  - source_labels: [__address__]
    separator: ;
    regex: (.*)
    modulus: 1
    target_label: __tmp_hash
    replacement: $1
    action: hashmod
  - source_labels: [__tmp_hash]
    separator: ;
    regex: "0"
    replacement: $1
    action: keep
  kubernetes_sd_configs:
  - role: pod
    kubeconfig_file: ""
    follow_redirects: true
    enable_http2: true
    namespaces:
      own_namespace: false
      names:
      - cnpg-system

Prometheus spec:

 spec:
    alerting:
      alertmanagers:
      - apiVersion: v2
        name: prometheus-alertmanager
        namespace: monitoring-system
        pathPrefix: /
        port: http-web
    enableAdminAPI: false
    evaluationInterval: 30s
    externalUrl: http://domain.com/
    hostNetwork: false
    image: quay.io/prometheus/prometheus:v2.52.0
    listenLocal: false
    logFormat: logfmt
    logLevel: info
    paused: false
    podMonitorNamespaceSelector: {}
    podMonitorSelector: {}
    portName: http-web
    probeNamespaceSelector: {}
    probeSelector: {}
    replicas: 1
    retention: 10d
    routePrefix: /
    ruleNamespaceSelector: {}
    ruleSelector: {}
    scrapeConfigNamespaceSelector: {}
    scrapeConfigSelector: {}
    scrapeInterval: 30s
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
      seccompProfile:
        type: RuntimeDefault
    serviceAccountName: prometheus-prometheus
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector: {}
    shards: 1
    tsdb:
      outOfOrderTimeWindow: 0s
    version: v2.52.0
    walCompression: true

EDIT: Added Prometheus spec.

itay-grudev commented 4 months ago

kube-prometheus-stack by default only scrapes monitors that are labeled with it's release name. You can remove that behavior by setting:


prometheus:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    probeSelectorNilUsesHelmValues: false

Alternatively you can apply the required labels to the CNPG resources, but the former solution is better when having a centralized cluster monitoring.

In any way, verify that the corresponding CNPG ServiceMonitor/PodMonitor resources exist. If they do, look for problems on the Prometheus side.

rwarford commented 4 months ago

kube-prometheus-stack by default only scrapes monitors that are labeled with it's release name. You can remove that behavior by setting:

prometheus:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    probeSelectorNilUsesHelmValues: false

Alternatively you can apply the required labels to the CNPG resources, but the former solution is better when having a centralized cluster monitoring.

In any way, verify that the corresponding CNPG ServiceMonitor/PodMonitor resources exist. If they do, look for problems on the Prometheus side.

Thank you for your reply but please note that I pointed out in the "What I've Tried" section that I used those settings and am still not getting any metrics. Also note that the cpng podMonitor is recognized by Prometheus.

rwarford commented 4 months ago

@itay-grudev Could you take another look at this issue please? As I mentioned in the comment above I did have the options set that you recommended. I've added my PrometheusSpec to the original post.

I haven't done anything significant to the values files other than enable ingresses. I do get other metrics in Prometheus (node metrics for example).

itay-grudev commented 4 months ago

From our perspective, the chart's responsibility ends at provisioning the appropriate resources. Debugging why the provisioned resources fail to be recognized by Prometheus falls outside of the assistance we have the capacity or will to provide.

That being said if you discover a genuine corner case that we haven't covered, feel free to open a new ticket.