coroot / coroot-node-agent

A Prometheus exporter based on eBPF that gathers comprehensive container metrics
https://coroot.com/docs/metrics/node-agent
Apache License 2.0
318 stars 59 forks source link

kube-state-metrics is missing despite being deployed and running and shows in Prometheus #83

Open yoyoraso opened 4 months ago

yoyoraso commented 4 months ago

Hi, I have main coroot deployed in one cluster and working on add other clusters to this one by adding already deployed prometheus, kube-state-metrics already deployed on them and just deploying coroot-node-agent, but I can't see kube-state-metrics and service map

image image

so I started investgating and found this fails on the coroot-node-agent pods "failed to get container metadata for pid 16843 -> /kubepods/burstable/pod6f222fb5-3d0e-425e-899c-e5495124a057/ea64d45c2a6338bb0f9aae2f05ec4a77e323915d25ed11b19cb2504cbf2113d0: failed to interact with dockerd (%!s()) or with containerd (%!s())"

kubernetes version : v1.25.16+vmware.1 OS: Ubuntu 22.04.4 LTS kernal : 6.5.0-21-generic container runtime : containerd://1.6.28 coroot node agent tag : 1.18.9

apetruhin commented 4 months ago

@yoyoraso, we need to examine the node-agent's log. Could you please restart it, wait a minute, and then provide the entire log here?

yoyoraso commented 4 months ago

@apetruhin, here it is I0510 14:40:50.531724 606823 net.go:30] ephemeral-port-range: 32768-60999 I0510 14:40:50.540212 606823 cilium.go:30] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct4_global: no such file or directory I0510 14:40:50.540261 606823 cilium.go:36] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct6_global: no such file or directory I0510 14:40:50.540272 606823 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v2: no such file or directory I0510 14:40:50.540280 606823 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v3: no such file or directory I0510 14:40:50.540290 606823 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v2: no such file or directory I0510 14:40:50.540300 606823 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v3: no such file or directory I0510 14:40:50.540313 606823 main.go:102] agent version: 1.18.9 I0510 14:40:50.540380 606823 main.go:108] hostname: ** I0510 14:40:50.540389 606823 main.go:109] kernel version: 6.5.0-21-generic I0510 14:40:50.541001 606823 main.go:75] machine-id: ** I0510 14:40:50.541035 606823 tracing.go:34] no OpenTelemetry traces collector endpoint configured I0510 14:40:50.541048 606823 otel.go:26] no OpenTelemetry logs collector endpoint configured I0510 14:40:50.541180 606823 metadata.go:67] cloud provider: I0510 14:40:50.541193 606823 collector.go:157] instance metadata: I0510 14:40:50.541282 606823 profiling.go:49] no profiles endpoint configured W0510 14:40:50.541721 606823 registry.go:75] Cannot connect to the Docker daemon at unix:///proc/1/root/run/docker.sock. Is the docker daemon running? W0510 14:40:50.541721 606823 registry.go:75] Cannot connect to the Docker daemon at unix:///proc/1/root/run/docker.sock. Is the docker daemon running? W0510 14:40:54.544388 606823 registry.go:78] couldn't connect to containerd through the following UNIX sockets [/var/snap/microk8s/common/run/containerd.sock,/run/k0s/containerd.sock,/run/k3s/containerd/containerd.sock,/run/containerd/containerd.sock]: failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded W0510 14:40:54.544388 606823 registry.go:78] couldn't connect to containerd through the following UNIX sockets [/var/snap/microk8s/common/run/containerd.sock,/run/k0s/containerd.sock,/run/k3s/containerd/containerd.sock,/run/containerd/containerd.sock]: failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded W0510 14:40:54.544482 606823 registry.go:81] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory W0510 14:40:54.544482 606823 registry.go:81] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory I0510 14:40:54.878632 606823 registry.go:281] calculated container id 1 -> / -> I0510 14:40:54.878729 606823 registry.go:286] "ignoring" cg="/" pid=1 I0510 14:40:54.878791 606823 registry.go:281] calculated container id 2 -> / -> I0510 14:40:54.878805 606823 registry.go:286] "ignoring" cg="/" pid=2 I0510 14:40:54.878844 606823 registry.go:281] calculated container id 3 -> / -> I0510 14:40:54.878856 606823 registry.go:286] "ignoring" cg="/" pid=3 I0510 14:40:54.878893 606823 registry.go:281] calculated container id 4 -> / -> I0510 14:40:54.878901 606823 registry.go:286] "ignoring" cg="/" pid=4 I0510 14:40:54.878936 606823 registry.go:281] calculated container id 5 -> / -> I0510 14:40:54.878947 606823 registry.go:286] "ignoring" cg="/" pid=5 I0510 14:40:54.878982 606823 registry.go:281] calculated container id 6 -> / -> I0510 14:40:54.878994 606823 registry.go:286] "ignoring" cg="/" pid=6 I0510 14:40:54.879027 606823 registry.go:281] calculated container id 8 -> / -> I0510 14:40:54.879038 606823 registry.go:286] "ignoring" cg="/" pid=8 I0510 14:40:54.879073 606823 registry.go:281] calculated container id 11 -> / -> I0510 14:40:54.879081 606823 registry.go:286] "ignoring" cg="/" pid=11 I0510 14:40:54.879113 606823 registry.go:281] calculated container id 12 -> / -> I0510 14:40:54.879121 606823 registry.go:286] "ignoring" cg="/" pid=12 I0510 14:40:54.879153 606823 registry.go:281] calculated container id 13 -> / -> I0510 14:40:54.879166 606823 registry.go:286] "ignoring" cg="/" pid=13 I0510 14:40:54.879200 606823 registry.go:281] calculated container id 14 -> / -> I0510 14:40:54.879211 606823 registry.go:286] "ignoring" cg="/" pid=14 I0510 14:40:54.879244 606823 registry.go:281] calculated container id 15 -> / -> I0510 14:40:54.879251 606823 registry.go:286] "ignoring" cg="/" pid=15 I0510 14:40:54.879283 606823 registry.go:281] calculated container id 16 -> / -> I0510 14:40:54.879291 606823 registry.go:286] "ignoring" cg="/" pid=16 I0510 14:40:54.879325 606823 registry.go:281] calculated container id 17 -> / -> I0510 14:40:54.879332 606823 registry.go:286] "ignoring" cg="/" pid=17 I0510 14:40:54.879366 606823 registry.go:281] calculated container id 18 -> / -> I0510 14:40:54.879377 606823 registry.go:286] "ignoring" cg="/" pid=18 I0510 14:40:54.879410 606823 registry.go:281] calculated container id 19 -> / -> I0510 14:40:54.879419 606823 registry.go:286] "ignoring" cg="/" pid=19 I0510 14:40:54.879452 606823 registry.go:281] calculated container id 20 -> / -> I0510 14:40:54.879466 606823 registry.go:286] "ignoring" cg="/" pid=20 I0510 14:40:54.879500 606823 registry.go:281] calculated container id 21 -> / -> I0510 14:40:54.879511 606823 registry.go:286] "ignoring" cg="/" pid=21 I0510 14:40:54.879544 606823 registry.go:281] calculated container id 22 -> / -> I0510 14:40:54.879556 606823 registry.go:286] "ignoring" cg="/" pid=22 I0510 14:40:54.879588 606823 registry.go:281] calculated container id 23 -> / -> I0510 14:40:54.879600 606823 registry.go:286] "ignoring" cg="/" pid=23 I0510 14:40:54.879633 606823 registry.go:281] calculated container id 25 -> / -> I0510 14:40:54.879640 606823 registry.go:286] "ignoring" cg="/" pid=25 I0510 14:40:54.879674 606823 registry.go:281] calculated container id 26 -> / -> I0510 14:40:54.879685 606823 registry.go:286] "ignoring" cg="/" pid=26 I0510 14:40:54.879718 606823 registry.go:281] calculated container id 27 -> / -> I0510 14:40:54.879729 606823 registry.go:286] "ignoring" cg="/" pid=27 I0510 14:40:54.879768 606823 registry.go:281] calculated container id 28 -> / -> I0510 14:40:54.879781 606823 registry.go:286] "ignoring" cg="/" pid=28 I0510 14:40:54.879816 606823 registry.go:281] calculated container id 29 -> / -> I0510 14:40:54.879823 606823 registry.go:286] "ignoring" cg="/" pid=29 I0510 14:40:54.879858 606823 registry.go:281] calculated container id 31 -> / -> I0510 14:40:54.879865 606823 registry.go:286] "ignoring" cg="/" pid=31 I0510 14:40:54.879897 606823 registry.go:281] calculated container id 32 -> / -> I0510 14:40:54.879904 606823 registry.go:286] "ignoring" cg="/" pid=32 I0510 14:40:54.879936 606823 registry.go:281] calculated container id 33 -> / -> I0510 14:40:54.879949 606823 registry.go:286] "ignoring" cg="/" pid=33 I0510 14:40:54.879985 606823 registry.go:281] calculated container id 34 -> / -> I0510 14:40:54.879992 606823 registry.go:286] "ignoring" cg="/" pid=34 I0510 14:40:54.880055 606823 registry.go:281] calculated container id 35 -> / -> I0510 14:40:54.880063 606823 registry.go:286] "ignoring" cg="/" pid=35 I0510 14:40:54.880098 606823 registry.go:281] calculated container id 37 -> / -> I0510 14:40:54.880107 606823 registry.go:286] "ignoring" cg="/" pid=37 I0510 14:40:54.880140 606823 registry.go:281] calculated container id 38 -> / -> I0510 14:40:54.880154 606823 registry.go:286] "ignoring" cg="/" pid=38 I0510 14:40:54.880189 606823 registry.go:281] calculated container id 39 -> / -> I0510 14:40:54.880202 606823 registry.go:286] "ignoring" cg="/" pid=39 W0510 14:40:54.880228 606823 init.go:35] open /proc/1/net/tcp6: no such file or directory W0510 14:40:54.880228 606823 init.go:35] open /proc/1/net/tcp6: no such file or directory I0510 14:40:54.880236 606823 registry.go:281] calculated container id 40 -> / -> I0510 14:40:54.880290 606823 registry.go:286] "ignoring" cg="/" pid=40 I0510 14:40:54.880340 606823 registry.go:281] calculated container id 41 -> / -> I0510 14:40:54.880353 606823 registry.go:286] "ignoring" cg="/" pid=41 I0510 14:40:54.880391 606823 registry.go:281] calculated container id 43 -> / -> I0510 14:40:54.880399 606823 registry.go:286] "ignoring" cg="/" pid=43 I0510 14:40:54.880433 606823 registry.go:281] calculated container id 44 -> / -> I0510 14:40:54.880439 606823 registry.go:286] "ignoring" cg="/" pid=44 I0510 14:40:54.880472 606823 registry.go:281] calculated container id 45 -> / -> I0510 14:40:54.880480 606823 registry.go:286] "ignoring" cg="/" pid=45 I0510 14:40:54.880511 606823 registry.go:281] calculated container id 46 -> / -> I0510 14:40:54.880518 606823 registry.go:286] "ignoring" cg="/" pid=46 I0510 14:40:54.880549 606823 registry.go:281] calculated container id 47 -> / -> I0510 14:40:54.880556 606823 registry.go:286] "ignoring" cg="/" pid=47 I0510 14:40:54.880591 606823 registry.go:281] calculated container id 50 -> / -> I0510 14:40:54.880598 606823 registry.go:286] "ignoring" cg="/" pid=50 I0510 14:40:54.880631 606823 registry.go:281] calculated container id 51 -> / -> I0510 14:40:54.880638 606823 registry.go:286] "ignoring" cg="/" pid=51 I0510 14:40:54.880671 606823 registry.go:281] calculated container id 52 -> / -> I0510 14:40:54.880678 606823 registry.go:286] "ignoring" cg="/" pid=52 I0510 14:40:54.880711 606823 registry.go:281] calculated container id 53 -> / -> I0510 14:40:54.880718 606823 registry.go:286] "ignoring" cg="/" pid=53 I0510 14:40:54.880750 606823 registry.go:281] calculated container id 55 -> / -> I0510 14:40:54.880757 606823 registry.go:286] "ignoring" cg="/" pid=55 I0510 14:40:54.880790 606823 registry.go:281] calculated container id 56 -> / -> I0510 14:40:54.880797 606823 registry.go:286] "ignoring" cg="/" pid=56 I0510 14:40:54.880835 606823 registry.go:281] calculated container id 57 -> / -> I0510 14:40:54.880843 606823 registry.go:286] "ignoring" cg="/" pid=57 I0510 14:40:54.880877 606823 registry.go:281] calculated container id 58 -> / -> I0510 14:40:54.880884 606823 registry.go:286] "ignoring" cg="/" pid=58 I0510 14:40:54.880918 606823 registry.go:281] calculated container id 59 -> / -> I0510 14:40:54.969239 606823 registry.go:213] TCP connection from unknown container {connection-open none 9196 11.0.101.3:33262 11.33.38.9:9093 34 622082154767560 } W0510 14:40:55.888703 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s()) W0510 14:40:55.888703 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s()) I0510 14:40:55.888742 606823 registry.go:213] TCP connection from unknown container {connection-open none 14343 11.32.115.7:51738 10.100.192.1:443 111 622083074116078 } W0510 14:40:55.929900 606823 registry.go:277] failed to get container metadata for pid 20694 -> /kubepods/burstable/pod671ca5e2-5ce3-46a0-b10f-f5e4f8098e33/4bda181fc1ca52ebe65399cb8e11649c0e133cd6a85f959f0f6a3d370478f2cb: failed to interact with dockerd (%!s()) or with containerd (%!s()) W0510 14:40:55.929900 606823 registry.go:277] failed to get container metadata for pid 20694 -> /kubepods/burstable/pod671ca5e2-5ce3-46a0-b10f-f5e4f8098e33/4bda181fc1ca52ebe65399cb8e11649c0e133cd6a85f959f0f6a3d370478f2cb: failed to interact with dockerd (%!s()) or with containerd (%!s()) I0510 14:40:55.929929 606823 registry.go:213] TCP connection from unknown container {connection-open none 20694 127.0.0.1:44250 127.0.0.1:8080 14 622083115307215 } W0510 14:40:55.946816 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s()) W0510 14:40:55.946816 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s()) I0510 14:40:55.946857 606823 registry.go:213] TCP connection from unknown container {connection-open none 14343 11.32.115.7:51752 10.100.192.1:443 107 622083132268869 } W0510 14:40:55.946943 606823 registry.go:277] failed to get container metadata for pid 14343 -> /kubepods/besteffort/poda7143171-67b8-4c99-b7b5-3b850b41d2e5/65e74674bb94880aa9b1b8d913b1f37d6ac36613be45fb3e2ca13837db45c1fb: failed to interact with dockerd (%!s()) or with containerd (%!s())

apetruhin commented 4 months ago

Could you please ssh to the node and check for containerd.sock:

# ls -l /run/containerd/containerd.sock
srw-rw---- 1 root root 0 Jan  4 09:04 /run/containerd/containerd.sock
yoyoraso commented 4 months ago

@apetruhin I can't access the cluster nodes sadly :(

apetruhin commented 4 months ago

The agent failed to locate containerd.sock.

Please exec into the node-agent pod and try to find the containerd.sock file:

kubectl -n coroot exec -ti coroot-node-agent-dwwrf -- bash

root@coroot-node-agent-dwwrf:/# ls -l /proc/1/root/run/containerd/containerd.sock

The root filesystem should be accessible from a node-agent pod under /proc/1/root/.

yoyoraso commented 4 months ago

@apetruhin
root@node-agent-ntkdw:/# ls -l /proc/1/root/run/containerd/containerd.sock lrwxrwxrwx 1 root root 44 May 3 10:14 /proc/1/root/run/containerd/containerd.sock -> /var/vcap/sys/run/containerd/containerd.sock

apetruhin commented 4 months ago

@yoyoraso, could you please verify whether /proc/1/root/var/vcap/sys/run/containerd/containerd.sock is not symlink to another location?

root@node-agent-ntkdw:/# ls -l /proc/1/root/var/vcap/sys/run/containerd/containerd.sock
yoyoraso commented 4 months ago

Hi @apetruhin root@node-agent-ntkdw:/# ls -l /proc/1/root/var/vcap/sys/run/containerd/containerd.sock ls: cannot access '/proc/1/root/var/vcap/sys/run/containerd/containerd.sock': No such file or directory

apetruhin commented 4 months ago

@yoyoraso, please provide details about your setup and instructions on how to run this type of Kubernetes environment to reproduce the issue.

yoyoraso commented 4 months ago

@apetruhin it is a basic k8s cluster made using vm tanzu
kubernetes version : v1.25.16+vmware.1 Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.16+vmware.1", GitCommit:"84fd181a4243c4354b9208f4292f1b6cd82726b1", GitTreeState:"clean", BuildDate:"2023-11-21T10:59:59Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"} OS: Ubuntu 22.04.4 LTS kernal : 6.5.0-21-generic container runtime : containerd://1.6.28 coroot node agent tag : 1.18.9

let me know if you needed more information