Open yoyoraso opened 4 months ago
@yoyoraso, we need to examine the node-agent's log. Could you please restart it, wait a minute, and then provide the entire log here?
@apetruhin, here it is
I0510 14:40:50.531724 606823 net.go:30] ephemeral-port-range: 32768-60999
I0510 14:40:50.540212 606823 cilium.go:30] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct4_global: no such file or directory
I0510 14:40:50.540261 606823 cilium.go:36] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct6_global: no such file or directory
I0510 14:40:50.540272 606823 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v2: no such file or directory
I0510 14:40:50.540280 606823 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v3: no such file or directory
I0510 14:40:50.540290 606823 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v2: no such file or directory
I0510 14:40:50.540300 606823 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v3: no such file or directory
I0510 14:40:50.540313 606823 main.go:102] agent version: 1.18.9
I0510 14:40:50.540380 606823 main.go:108] hostname: **
I0510 14:40:50.540389 606823 main.go:109] kernel version: 6.5.0-21-generic
I0510 14:40:50.541001 606823 main.go:75] machine-id: **
I0510 14:40:50.541035 606823 tracing.go:34] no OpenTelemetry traces collector endpoint configured
I0510 14:40:50.541048 606823 otel.go:26] no OpenTelemetry logs collector endpoint configured
I0510 14:40:50.541180 606823 metadata.go:67] cloud provider:
I0510 14:40:50.541193 606823 collector.go:157] instance metadata:
Could you please ssh to the node and check for containerd.sock
:
# ls -l /run/containerd/containerd.sock
srw-rw---- 1 root root 0 Jan 4 09:04 /run/containerd/containerd.sock
@apetruhin I can't access the cluster nodes sadly :(
The agent failed to locate containerd.sock
.
Please exec into the node-agent pod and try to find the containerd.sock
file:
kubectl -n coroot exec -ti coroot-node-agent-dwwrf -- bash
root@coroot-node-agent-dwwrf:/# ls -l /proc/1/root/run/containerd/containerd.sock
The root filesystem should be accessible from a node-agent pod under /proc/1/root/
.
@apetruhin
root@node-agent-ntkdw:/# ls -l /proc/1/root/run/containerd/containerd.sock
lrwxrwxrwx 1 root root 44 May 3 10:14 /proc/1/root/run/containerd/containerd.sock -> /var/vcap/sys/run/containerd/containerd.sock
@yoyoraso, could you please verify whether /proc/1/root/var/vcap/sys/run/containerd/containerd.sock
is not symlink to another location?
root@node-agent-ntkdw:/# ls -l /proc/1/root/var/vcap/sys/run/containerd/containerd.sock
Hi @apetruhin root@node-agent-ntkdw:/# ls -l /proc/1/root/var/vcap/sys/run/containerd/containerd.sock ls: cannot access '/proc/1/root/var/vcap/sys/run/containerd/containerd.sock': No such file or directory
@yoyoraso, please provide details about your setup and instructions on how to run this type of Kubernetes environment to reproduce the issue.
@apetruhin it is a basic k8s cluster made using vm tanzu
kubernetes version : v1.25.16+vmware.1
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.16+vmware.1", GitCommit:"84fd181a4243c4354b9208f4292f1b6cd82726b1", GitTreeState:"clean", BuildDate:"2023-11-21T10:59:59Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
OS: Ubuntu 22.04.4 LTS
kernal : 6.5.0-21-generic
container runtime : containerd://1.6.28
coroot node agent tag : 1.18.9
let me know if you needed more information
Hi, I have main coroot deployed in one cluster and working on add other clusters to this one by adding already deployed prometheus, kube-state-metrics already deployed on them and just deploying coroot-node-agent, but I can't see kube-state-metrics and service map
so I started investgating and found this fails on the coroot-node-agent pods "failed to get container metadata for pid 16843 -> /kubepods/burstable/pod6f222fb5-3d0e-425e-899c-e5495124a057/ea64d45c2a6338bb0f9aae2f05ec4a77e323915d25ed11b19cb2504cbf2113d0: failed to interact with dockerd (%!s()) or with containerd (%!s())"
kubernetes version : v1.25.16+vmware.1 OS: Ubuntu 22.04.4 LTS kernal : 6.5.0-21-generic container runtime : containerd://1.6.28 coroot node agent tag : 1.18.9