coroot / coroot-node-agent

A Prometheus exporter based on eBPF that gathers comprehensive container metrics
https://coroot.com/docs/metrics/node-agent
Apache License 2.0
294 stars 54 forks source link

Support for 4.x kernels has been dropped? #106

Open FutureMatt opened 1 week ago

FutureMatt commented 1 week ago

I can't see anything obvious in the changelogs but it looks like at some point after 1.18.9 support for Linux 4.x Kernels was dropped. We currently run some clusters that have a combination of 4.19.0-19 and 5.10.0-29 kernels but the clusters with 4.x kernels are now failing do deploy the node agent with the following log output.

I0705 09:18:07.156568   85825 net.go:20] whitelisted public IPs: [0.0.0.0/0]
I0705 09:18:07.156905   85825 net.go:32] ephemeral-port-range: 32768-60999
I0705 09:18:07.164387   85825 cilium.go:30] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct4_global: no such file or directory
I0705 09:18:07.164448   85825 cilium.go:36] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct6_global: no such file or directory
I0705 09:18:07.164460   85825 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v2: no such file or directory
I0705 09:18:07.164472   85825 cilium.go:43] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v3: no such file or directory
I0705 09:18:07.164483   85825 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v2: no such file or directory
I0705 09:18:07.164491   85825 cilium.go:52] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v3: no such file or directory
I0705 09:18:07.167570   85825 main.go:111] agent version: 1.20.3
I0705 09:18:07.167635   85825 main.go:117] hostname: xxxxxxxxx-worker-1
I0705 09:18:07.167644   85825 main.go:118] kernel version: 4.19.0-18-amd64
I0705 09:18:07.169872   85825 main.go:75] machine-id:  xxxxxxxxxxxxxxxxx
I0705 09:18:07.169971   85825 tracing.go:37] OpenTelemetry traces collector endpoint: http://coroot:8080/v1/traces
I0705 09:18:07.170090   85825 otel.go:29] OpenTelemetry logs collector endpoint: http://coroot:8080/v1/logs
I0705 09:18:07.170401   85825 metadata.go:67] cloud provider:
I0705 09:18:07.170419   85825 collector.go:157] instance metadata: <nil>
I0705 09:18:07.170670   85825 profiling.go:52] profiles endpoint: http://coroot:8080/v1/profiles
E0705 09:18:07.198354   85825 profiling.go:100] load bpf objects: field DisassociateCtty: program disassociate_ctty: apply CO-RE relocations: load kernel spec: no BTF found for kernel version 4.19.0-18-amd64: not supported
E0705 09:18:07.198354   85825 profiling.go:100] load bpf objects: field DisassociateCtty: program disassociate_ctty: apply CO-RE relocations: load kernel spec: no BTF found for kernel version 4.19.0-18-amd64: not supported
E0705 09:18:07.198354   85825 profiling.go:100] load bpf objects: field DisassociateCtty: program disassociate_ctty: apply CO-RE relocations: load kernel spec: no BTF found for kernel version 4.19.0-18-amd64: not supported
E0705 09:18:07.198354   85825 profiling.go:100] load bpf objects: field DisassociateCtty: program disassociate_ctty: apply CO-RE relocations: load kernel spec: no BTF found for kernel version 4.19.0-18-amd64: not supported
I0705 09:18:10.202542   85825 containerd.go:38] using /run/containerd/containerd.sock
W0705 09:18:10.202604   85825 registry.go:85] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
W0705 09:18:10.202604   85825 registry.go:85] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
E0705 09:18:10.234982   85825 tracer.go:191] load program: argument list too long:
E0705 09:18:10.234982   85825 tracer.go:191] load program: argument list too long:
E0705 09:18:10.234982   85825 tracer.go:191] load program: argument list too long:
E0705 09:18:10.234982   85825 tracer.go:191] load program: argument list too long:
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
F0705 09:18:10.235037   85825 main.go:149] failed to load collection: program sys_enter_sendmmsg: load program: argument list too long
def commented 1 week ago

It wasn't intentional. We added an eBPF program with more instructions than the others. Kernel 4.19 has a lower limit for the number of instructions in eBPF programs