coroot / coroot-node-agent

A Prometheus exporter based on eBPF that gathers comprehensive container metrics
https://coroot.com/docs/metrics/node-agent
Apache License 2.0
311 stars 55 forks source link

restar node-agent #23

Closed KKulishov closed 1 year ago

KKulishov commented 1 year ago

Deploy to kubernetes 1.20 (rancher 2.5) container (docker 20.10.17) .

Always restart coroot-node-agent , logs:

I0726 10:54:35.984709 3655059 cilium.go:35] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_ct6_global: no such file or directory
I0726 10:54:35.984723 3655059 cilium.go:42] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v2: no such file or directory
I0726 10:54:35.984738 3655059 cilium.go:42] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb4_backends_v3: no such file or directory
I0726 10:54:35.984750 3655059 cilium.go:51] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v2: no such file or directory
I0726 10:54:35.984763 3655059 cilium.go:51] Unable to get object /proc/1/root/sys/fs/bpf/tc/globals/cilium_lb6_backends_v3: no such file or directory
I0726 10:54:35.984964 3655059 main.go:81] agent version: 1.8.8
I0726 10:54:35.985040 3655059 main.go:87] hostname: umt-k8s-mts-datapro-c1.sovcombank.group
I0726 10:54:35.985045 3655059 main.go:88] kernel version: 5.4.17-2136.304.4.1.el8uek.x86_64
I0726 10:54:35.985121 3655059 main.go:71] machine-id:  98710f42f40b2be32d22d80e35a5e0c4
I0726 10:54:35.985145 3655059 tracing.go:29] no OpenTelemetry collector endpoint configured
I0726 10:54:35.985291 3655059 metadata.go:66] cloud provider:
I0726 10:54:35.985299 3655059 collector.go:157] instance metadata: <nil>
I0726 10:54:38.990940 3655059 containerd.go:37] using /run/containerd/containerd.sock
W0726 10:54:38.991003 3655059 registry.go:72] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
W0726 10:54:38.991003 3655059 registry.go:72] stat /proc/1/root/var/run/crio/crio.sock: no such file or directory
F0726 10:54:38.994821 3655059 main.go:112] kernel tracing is not available: stat /sys/kernel/debug/tracing: no such file or directory
F0726 10:54:38.994821 3655059 main.go:112] kernel tracing is not available: stat /sys/kernel/debug/tracing: no such file or directory
F0726 10:54:38.994821 3655059 main.go:112] kernel tracing is not available: stat /sys/kernel/debug/tracing: no such file or directory
F0726 10:54:38.994821 3655059 main.go:112] kernel tracing is not available: stat /sys/kernel/debug/tracing: no such file or directory
F0726 10:54:38.994821 3655059 main.go:112] kernel tracing is not available: stat /sys/kernel/debug/tracing: no such file or directory

checking directories on the parent node

root@umt-k8s-mts-datapro-c1:/home/kulishovkm # ls -la /proc/1/root/var/run/cri/
ls: cannot access '/proc/1/root/var/run/cri/': No such file or directory
root@umt-k8s-mts-datapro-c1:/home/kulishovkm # ls -la /sys/kernel/debug/tracing
ls: cannot access '/sys/kernel/debug/tracing': No such file or directory
root@umt-k8s-mts-datapro-c1:/home/kulishovkm # ls -la /sys/kernel/debug/tracing/
ls: cannot access '/sys/kernel/debug/tracing/': No such file or directory

linux kernel 5.4 on parent node

image version coroot-node-agent 1.8.8

this my DaemonSet

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: coroot-agent-node-agent
  labels:
    helm.sh/chart: node-agent-0.1.34
    app.kubernetes.io/name: node-agent
    app.kubernetes.io/instance: coroot-agent
    app.kubernetes.io/version: "1.8.8"
    app.kubernetes.io/managed-by: Helm
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: node-agent
      app.kubernetes.io/instance: coroot-agent
  template:
    metadata:
      labels:
        app.kubernetes.io/name: node-agent
        app.kubernetes.io/instance: coroot-agent
        app: coroot-node-agent
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '80'
    spec:
      tolerations:
        - operator: Exists
      priorityClassName:
      hostPID: true
      containers:
        - name: node-agent
          image: "registry.sovcombank.group/s-devops/ghcr.io/coroot/coroot-node-agent:1.8.8"
          command: ["coroot-node-agent", "--cgroupfs-root", "/host/sys/fs/cgroup"]
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              cpu: "1"
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 50Mi
          env:
          ports:
            - containerPort: 80
              name: http
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: /host/sys/fs/cgroup
              name: cgroupfs
              readOnly: true
            - mountPath: /sys/kernel/debug
              name: debugfs
              readOnly: false
      volumes:
        - hostPath:
            path: /sys/fs/cgroup
          name: cgroupfs
        - hostPath:
            path: /sys/kernel/debug
          name: debugfs

Can you tell me what I'm doing wrong

def commented 1 year ago

@KKulishov Please check whether your kernel is compiled with CONFIG_PERF_EVENTS: grep CONFIG_PERF_EVENTS "/boot/config-$(uname -r)" and debugfs is mounted: mount|grep debugfs

KKulishov commented 1 year ago

here are the commands on the parent node

root@umt-k8s-mts-datapro-c1:/home/kulishovkm # grep CONFIG_PERF_EVENTS "/boot/config-$(uname -r)"
CONFIG_PERF_EVENTS=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_PERF_EVENTS_INTEL_RAPL=y
CONFIG_PERF_EVENTS_INTEL_CSTATE=y
CONFIG_PERF_EVENTS_AMD_POWER=m
root@umt-k8s-mts-datapro-c1:/home/kulishovkm # mount|grep debugfs
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
def commented 1 year ago

Please check this directory as well: ls -la /sys/kernel/tracing

/sys/kernel/debug/tracing should be mounted for backwards compatibility, but this can be disabled due to CONFIG_TRACEFS_DISABLE_AUTOMOUNT=y. If so, we'll fix the agent to look for tracing using both paths.

KKulishov commented 1 year ago

yes you are right

this ok

root@umt-k8s-mts-datapro-c1:/home/kulishovkm # ls -la /sys/kernel/tracing/
total 0
drwx------  2 root root 0 Jul 17 07:57 .
drwxr-xr-x 14 root root 0 Jul 17 07:57 ..

but /sys/kernel/debug/tracing not found

root@umt-k8s-mts-datapro-c1:/home/kulishovkm # ls -la /sys/kernel/debug/tracing
ls: cannot access '/sys/kernel/debug/tracing': No such file or directory
def commented 1 year ago

No, it shouldn't be empty.

Check whether tracefs is mounted: mount |grep tracefs

KKulishov commented 1 year ago

mount |grep tracefs

none on /sys/kernel/tracing type tracefs (rw,relatime)

but , catalog /sys/kernel/tracing is empty

KKulishov commented 1 year ago

rather, the problem is somewhere in the distribution, installed on the parent node redhat 8.5

def commented 1 year ago

@KKulishov have you been able to solve this issue?

KKulishov commented 1 year ago

@def not yet, my company has modified the kernel, now I have created a resolution task in the kernel ftrace. I will communicate with my security service.

KKulishov commented 1 year ago

@def The problem was in the kernel boot options, which turned off , in grub: kernelopts=root=/dev/mapper/vgroot-root ro resume=/dev/mapper/vgroot-swap lockdown=confidentiality debugfs=off

if you remove lockdown=confidentiality debugfs=off and in grub and reboot.

all ok

/sys/kernel/debug/tracing and /sys/kernel/tracing no longer empty

security guys patched kernel boot options kernel_opts

Thank you for your help

def commented 1 year ago

@KKulishov thank you for the details