coroot / coroot-node-agent

A Prometheus exporter based on eBPF that gathers comprehensive container metrics
https://coroot.com/docs/metrics/node-agent
Apache License 2.0
312 stars 55 forks source link

failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded #12

Open koolay opened 1 year ago

koolay commented 1 year ago

coroot agent can't inspect instances on k3s cluster.

➜ k -n coroot logs -f coroot-node-agent-6tgvf
I0331 02:15:45.065649  521709 cilium.go:31] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct4_global: no such file or directory
I0331 02:15:45.065778  521709 cilium.go:37] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct6_global: no such file or directory
I0331 02:15:45.065784  521709 cilium.go:44] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v2: no such file or directory
I0331 02:15:45.065788  521709 cilium.go:44] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v3: no such file or directory
I0331 02:15:45.065792  521709 cilium.go:54] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v2: no such file or directory
I0331 02:15:45.065795  521709 cilium.go:54] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v3: no such file or directory
I0331 02:15:45.066085  521709 main.go:76] agent version: 1.7.4
I0331 02:15:45.066112  521709 main.go:82] hostname: huwl-QiTianM455-N000
I0331 02:15:45.066114  521709 main.go:83] kernel version: 5.19.0-35-generic
I0331 02:15:45.066141  521709 main.go:69] machine-id:  d4a6c200f13211ec8299c0898d1e2c00
I0331 02:15:45.066224  521709 metadata.go:66] cloud provider:
I0331 02:15:45.066227  521709 collector.go:157] instance metadata: <nil>
W0331 02:15:45.066887  521709 registry.go:65] Cannot connect to the Docker daemon at unix:///proc/1/root/run/docker.sock. Is the docker daemon running?
W0331 02:15:49.069727  521709 registry.go:68] couldn't connect to containerd through the following UNIX sockets [/var/snap/microk8s/common/run/containerd.sock,/run/k0s/containerd.sock,/run/k3s/containerd/containerd.sock,/run/containerd/containerd.sock]: failed to dial "/proc/1/root/run/containerd/containerd.sock": context deadline exceeded
I0331 02:15:49.165799  521709 registry.go:262] calculated container id 1 -> /init.scope ->
I0331 02:15:49.165856  521709 registry.go:267] "ignoring" cg="/init.scope" pid=1
I0331 02:15:49.165896  521709 registry.go:262] calculated container id 2 -> / ->
I0331 02:15:49.165905  521709 registry.go:267] "ignoring" cg="/" pid=2
➜ sudo ls -al /proc/1/root/run/containerd/containerd.sock
[sudo] password for huwl:
lrwxrwxrwx 1 root root 35  3月 30 19:50 /proc/1/root/run/containerd/containerd.sock -> /run/k3s/containerd/containerd.sock

➜ sudo ls /run/k3s/containerd/containerd.sock
/run/k3s/containerd/containerd.sock
✖ k -n coroot describe pod coroot-node-agent-6tgvf
Name:             coroot-node-agent-6tgvf
Namespace:        coroot
Priority:         0
Service Account:  default
Node:             --------
Start Time:       Fri, 31 Mar 2023 10:15:44 +0800
Labels:           app=coroot-node-agent
                  app.kubernetes.io/instance=coroot
                  app.kubernetes.io/name=node-agent
                  controller-revision-hash=c74fb5cf8
                  pod-template-generation=1
Annotations:      prometheus.io/port: 80
                  prometheus.io/scrape: true
Status:           Running
IP:               10.42.0.113
IPs:
  IP:           10.42.0.113
Controlled By:  DaemonSet/coroot-node-agent
Containers:
  node-agent:
    Container ID:  containerd://9faf71324df7cf6e4b3fbb587acda1b66efbc07142e518a914b4a1ab77e25eb8
    Image:         ghcr.io/coroot/coroot-node-agent:1.7.4
    Image ID:      ghcr.io/coroot/coroot-node-agent@sha256:a0572c1cc25b16f1625e760c893b20ee0d42263d3f8a98eda7cdeca88a8fd935
    Port:          80/TCP
    Host Port:     0/TCP
    Command:
      coroot-node-agent
      --cgroupfs-root
      /host/sys/fs/cgroup
    State:          Running
      Started:      Fri, 31 Mar 2023 10:15:45 +0800
    Ready:          True
    Restart Count:  0
apetruhin commented 1 year ago

@koolay, please check the containerd socket path inside the node-agent container:

kubectl -n coroot exec -ti coroot-node-agent-6tgvf -- ls -la /proc/1/root/run/k3s/containerd/containerd.sock
koolay commented 1 year ago

@apetruhin No such file or directory

root@coroot-node-agent-tjf8q:/# ls -al proc/1/root/run/k3s/containerd
lrwxrwxrwx 1 root root 37 Apr 21 02:43 proc/1/root/run/k3s/containerd -> /data/k3s/containerd-state/containerd

image