carlosedp / cluster-monitoring

Cluster monitoring stack for clusters based on Prometheus Operator
MIT License
740 stars 200 forks source link

K3s 0.9 - 1.0+ Missing metrics #25

Closed NicklasWallgren closed 4 years ago

NicklasWallgren commented 5 years ago

image

It seems like node_namespace_pod_container is missing, I can't really pinpoint why. Is it K3s related?

Referenced issue https://github.com/coreos/kube-prometheus/issues/284

Banders2 commented 5 years ago

I'm having the same problem running K3s 1.0.0, this is unfortunately breaking a lot of dashboard definitions. Did you manage to find a solution? @NicklasWallgren

NicklasWallgren commented 5 years ago

The rules are correctly defined, but something about container_cpu_usage_seconds_total has changed. It seems like the image-field has gone missing.

The following query will fail, since the image metadata is empty for all containers. container_cpu_usage_seconds_total{job="kubelet", image!="", container!="POD"}

    - expr: |
        sum by (namespace, pod, container) (
          rate(container_cpu_usage_seconds_total{job="kubelet", image!="", container!="POD"}[5m])
        ) * on (namespace, pod) group_left(node) max by(namespace, pod, node) (kube_pod_info)
      record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate

It's an known issue with k3s. https://github.com/rancher/k3s/issues/473

NicklasWallgren commented 4 years ago

Updated to latest version of k3s.

It seems to have improved things. image

annismckenzie commented 4 years ago

To get almost all of the dashboards working again, see https://github.com/rancher/k3s/issues/473#issuecomment-575919050.

Still, there's a couple tiles that still don't work. Those are mainly problems with container vs. container_name. Could that be a bug? The main dashboards uses

sort_desc(sum(container_memory_usage_bytes{image!=""}) by (container_name, image))

to display the Pod memory usage but the series uses the label container instead of container_name:

container_memory_usage_bytes{container="coredns",endpoint="http-metrics",id="/kubepods/burstable/podc5ae821e-4847-4781-8d6e-b9ca300cd6b5/fe0ade11e5bfbd2dd49d47f4e8f92ded86caf7e464fdc87db9120e5449cd3e09",image="docker.io/coredns/coredns:1.6.3",instance="192.168.0.89:10255",job="kubelet",metrics_path="/metrics/cadvisor",name="fe0ade11e5bfbd2dd49d47f4e8f92ded86caf7e464fdc87db9120e5449cd3e09",namespace="kube-system",node="raspberrypi",pod="coredns-d798c9dd-zb5lr",service="kubelet"}
NicklasWallgren commented 4 years ago

@annismckenzie Which dashboards are you referring to?

annismckenzie commented 4 years ago

The main one: »Kubernetes cluster monitoring (via Prometheus)«. Screenshot 2020-01-20 at 08 21 15

If I start replacing container_name with container they start to look a lot better:

Screenshot 2020-01-20 at 08 23 59

Can we do that with a relabeling rule? Sorry, I'm a bit new to this.

carlosedp commented 4 years ago

@NicklasWallgren can I consider this as fixed?

NicklasWallgren commented 4 years ago

@carlosedp The issue has resurfaced in the latest version of k3s.

https://github.com/rancher/k3s/issues/1522

carlosedp commented 4 years ago

🙄 Let's wait...

NicklasWallgren commented 4 years ago

@carlosedp I couldn't reproduce the errors, and it's working fine using the latest version of k3s. I'm closing this one :)