canonical / grafana-k8s-operator

https://charmhub.io/grafana-k8s
Apache License 2.0
6 stars 22 forks source link

New warning in edge: PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket #316

Closed nobuto-m closed 3 months ago

nobuto-m commented 6 months ago

Bug Description

I started seeing the following new warning in Grafana dashboard when using edge channels for COS. Not sure where to fix at this point and I'm collecting information.

PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: "workqueue_depth" (1:6)

image

image

sum(rate(workqueue_depth{cluster="", job="kube-controller-manager", instance=~"localhost:16443"}[$__rate_interval])) by (cluster, instance, name)

ref: https://github.com/prometheus/prometheus/issues/12945

To Reproduce

  1. deploy microk8s juju deploy --channel 1.28/stable microk8s
  2. deploy cos on top of it juju deploy --channel latest/edge --trust cos-lite
  3. connect microk8s and COS using grafana-agent machine charm
  4. open kubernetes-controller-manage dashboard in Grafana and look at the "Work Queue Depth" panel

Environment

[reproducible]

App                            Version  Status  Scale  Charm                         Channel      Rev
alertmanager                   0.27.0   active      1  alertmanager-k8s              latest/edge  107
catalogue                               active      1  catalogue-k8s                 latest/edge   33
cos-configuration-ceph         3.5.0    active      1  cos-configuration-k8s         latest/edge   47
grafana                        9.5.3    active      1  grafana-k8s                   latest/edge  108
loki                           2.9.5    active      1  loki-k8s                      latest/edge  128
prometheus                     2.50.1   active      1  prometheus-k8s                latest/edge  173
prometheus-scrape-config-ceph  n/a      active      1  prometheus-scrape-config-k8s  latest/edge   47
traefik                        v2.11.0  active      1  traefik-k8s                   latest/edge  177
$ juju resources prometheus --format yaml
resources:
- resourceid: prometheus/prometheus-image
  applicationId: prometheus
  name: prometheus-image
  type: oci-image
  path: ""
  description: Container image for Prometheus
  revision: "141"
  fingerprint: 7cd4a7cbc3b2183e4744a09b42c57e90e126eb8d2bfaeaa14dc5e8136b647fcbb533e77badc0a56d0dc04d22982b0d35
  size: 511
  origin: store
  used: true
  timestamp: 2024-04-05T02:10:25.428Z
  username: prometheus
  combinedrevision: "141"
  usedyesno: "yes"
  combinedorigin: store
$ juju resources grafana --format yaml
resources:
- resourceid: grafana/grafana-image
  applicationId: grafana
  name: grafana-image
  type: oci-image
  path: ""
  description: upstream docker image for Grafana
  revision: "68"
  fingerprint: b21714113748180496e734731bddf1255f0173c617e8117419e5e508961fe4506fdeb6207cd3fcbb630d1bf0e7896a1e
  size: 504
  origin: store
  used: true
  timestamp: 2024-04-05T02:10:06.14Z
  username: grafana
  combinedrevision: "68"
  usedyesno: "yes"
  combinedorigin: store
- resourceid: grafana/litestream-image
  applicationId: grafana
  name: litestream-image
  type: oci-image
  path: ""
  description: upstream image for sqlite streaming
  revision: "43"
  fingerprint: 59669b82997bbae30d73a52625fcae2fa87f8d3bc2f6a32030588ddfe04b608f4b1e78d64c0cd4d96c3b8c89a884d949
  size: 511
  origin: store
  used: true
  timestamp: 2024-04-05T02:10:06.995Z
  username: grafana
  combinedrevision: "43"
  usedyesno: "yes"
  combinedorigin: store

[NOT reproducible]

App                            Version  Status  Scale  Charm                         Channel        Rev
alertmanager                   0.26.0   active      1  alertmanager-k8s              latest/stable  103
catalogue                               active      1  catalogue-k8s                 latest/stable   33
cos-configuration-ceph         3.5.0    active      1  cos-configuration-k8s         latest/stable   45
grafana                        9.5.3    active      1  grafana-k8s                   latest/stable  105
loki                           2.9.4    active      1  loki-k8s                      latest/stable  121
prometheus                     2.49.1   active      1  prometheus-k8s                latest/stable  170
prometheus-scrape-config-ceph  n/a      active      1  prometheus-scrape-config-k8s  latest/stable   47
traefik                        2.10.5   active      1  traefik-k8s                   latest/stable  170
$ juju resources prometheus --format yaml
resources:
- resourceid: prometheus/prometheus-image
  applicationId: prometheus
  name: prometheus-image
  type: oci-image
  path: ""
  description: Container image for Prometheus
  revision: "139"
  fingerprint: 3fdf893a02516b7d32003a42503cd6607ab832423adb4208a3f4b2a4848e1d738e49ec906cd1a3284247ac789148970f
  size: 511
  origin: store
  used: true
  timestamp: 2024-04-04T14:04:38.132Z
  username: prometheus
  combinedrevision: "139"
  usedyesno: "yes"
  combinedorigin: store
$ juju resources grafana --format yaml
resources:
- resourceid: grafana/grafana-image
  applicationId: grafana
  name: grafana-image
  type: oci-image
  path: ""
  description: upstream docker image for Grafana
  revision: "68"
  fingerprint: b21714113748180496e734731bddf1255f0173c617e8117419e5e508961fe4506fdeb6207cd3fcbb630d1bf0e7896a1e
  size: 504
  origin: store
  used: true
  timestamp: 2024-04-04T14:04:19.83Z
  username: grafana
  combinedrevision: "68"
  usedyesno: "yes"
  combinedorigin: store
- resourceid: grafana/litestream-image
  applicationId: grafana
  name: litestream-image
  type: oci-image
  path: ""
  description: upstream image for sqlite streaming
  revision: "43"
  fingerprint: 59669b82997bbae30d73a52625fcae2fa87f8d3bc2f6a32030588ddfe04b608f4b1e78d64c0cd4d96c3b8c89a884d949
  size: 511
  origin: store
  used: true
  timestamp: 2024-04-04T14:04:20.728Z
  username: grafana
  combinedrevision: "43"
  usedyesno: "yes"
  combinedorigin: store

Relevant log output

N/A

Additional context

No response

sed-i commented 6 months ago
ca-scribner commented 3 months ago

@nobuto-m did you figure this out? From what we can see, this is a kubernetes dashboard that may not be using the typical naming convention.

@neoaggelos do you know who owns this dashboard? Looks like its something that you folks own

bayrambayramli commented 3 months ago

I have the same issue with Ingress NGINX Dashboard:

image

Query: sum(rate(nginx_ingress_controller_requests{controller_pod=~"$controller",controller_class=~"$controller_class",namespace=~"$namespace",status!~"[4-5].*"}[2m])) / sum(rate(nginx_ingress_controller_requests{controller_pod=~"$controller",controller_class=~"$controller_class",namespace=~"$namespace"}[2m]))

nginx_ingress_controller_requests is counter as described here.

dstathis commented 3 months ago

This will need to be addressed in the dashboard which is owned by the Microk8s charm. Perhaps you can open an issue there.