fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.82k stars 1.58k forks source link

Fluentbit in_docker crash #5149

Closed nitrogene closed 2 years ago

nitrogene commented 2 years ago

Discussed in https://github.com/fluent/fluent-bit/discussions/5092

Originally posted by **nitrogene** March 16, 2022 Hello, I have installed td-agent-bit on wsl2/ubuntu, and have troubles to make the *in_docker* plugin works. Here's an extract of *td-agent-bit.conf* ```bash [INPUT] # https://docs.fluentbit.io/manual/pipeline/inputs/docker-metrics Name docker Tag docker.metrics [INPUT] # https://docs.fluentbit.io/manual/pipeline/inputs/docker-events Name docker_events Tag docker.events [FILTER] # https://docs.fluentbit.io/manual/pipeline/filters/record-modifier Name record_modifier Match * Record hostname ${HOSTNAME} [OUTPUT] Name forward Match * Host 127.0.0.1 Port 24224 tls On tls.verify On tls.ca_file /etc/certs/graylog/certs/ca.crt.pem tls.crt_file /etc/certs/graylog/certs/client.crt.pem tls.key_file /etc/certs/graylog/private/client.key.pem tls.key_passwd ${TLS_PRIVATE_KEY_PASSPHRASE} Shared_Key ${SHARED_KEY} ``` And as soon as I start the agent: ```bash Fluent Bit v1.9.0 * Copyright (C) 2015-2021 The Fluent Bit Authors * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd * https://fluentbit.io [2022/03/16 16:01:55] [ info] [engine] started (pid=7746) [2022/03/16 16:01:55] [ info] [storage] version=1.1.6, initializing... [2022/03/16 16:01:55] [ info] [storage] in-memory [2022/03/16 16:01:55] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128 [2022/03/16 16:01:55] [ info] [cmetrics] version=0.3.0 [2022/03/16 16:01:55] [ warn] [input:thermal:thermal.4] thermal device file not found [2022/03/16 16:01:55] [ info] [input:docker_events:docker_events.6] listening for events on /var/run/docker.sock [2022/03/16 16:01:55] [ info] [sp] stream processor started [2022/03/16 16:01:56] [ info] [output:forward:forward.0] worker #0 started [2022/03/16 16:01:56] [ info] [output:forward:forward.0] worker #1 started [2022/03/16 16:01:56] [error] [plugins/in_docker/docker.c:315 errno=2] No such file or directory [2022/03/16 16:01:56] [engine] caught signal (SIGSEGV) [2022/03/16 16:01:56] [error] [input:docker:docker.5] error gathering CPU data from /sys/fs/cgroup/cpu/docker/c6947781249f695c7811ce5da932b2aa79f28ea95b89c0a12b9eb841fa28302e/cpuacct.usage #0 0x558ead4e587c in flush_snapshot() at plugins/in_docker/docker.c:701 #1 0x558ead4e5a03 in flush_snapshots() at plugins/in_docker/docker.c:728 #2 0x558ead4e5c21 in cb_docker_collect() at plugins/in_docker/docker.c:798 #3 0x558ead482141 in flb_input_collector_fd() at src/flb_input.c:1203 #4 0x558ead4986da in flb_engine_handle_event() at src/flb_engine.c:439 #5 0x558ead4986da in flb_engine_start() at src/flb_engine.c:761 #6 0x558ead474023 in flb_lib_worker() at src/flb_lib.c:626 #7 0x7f237eba2608 in ???() at ???:0 #8 0x7f237e4cb162 in ???() at ???:0 #9 0xffffffffffffffff in ???() at ???:0 ``` If I deactivate the *in_docker* plugin by commenting the relevant lines in the configuration file, it just works => I can see the host metrics (cpu, mem, etc..). Here's the content of the content of the */sys/fs/cgroup/cpu/docker/c6947781249f695c7811ce5da932b2aa79f28ea95b89c0a12b9eb841fa28302e/* folder : ```bash $ ls /sys/fs/cgroup/cpu/docker/c6947781249f695c7811ce5da932b2aa79f28ea95b89c0a12b9eb841fa28302e/ cgroup.clone_children cpu.cfs_period_us cpu.rt_period_us cpu.shares notify_on_release cgroup.procs cpu.cfs_quota_us cpu.rt_runtime_us cpu.stat tasks ``` Any idea ?
nokute78 commented 2 years ago

in_docker expects that a kernel supports CPU Accounting Controller https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt The error log indicated cpuacct.usage was not found.

Could you share cat /proc/cgroups and grep CGROUP /boot/config-* logs ?

nitrogene commented 2 years ago

Hello,

Here are the requested logs - please remember that I am using Ubuntu via WSL2/Windows 10:

~$ cat /proc/cgroups
#subsys_name    hierarchy       num_cgroups     enabled
cpuset  1       1       1
cpu     2       1       1
cpuacct 3       1       1
blkio   4       1       1
memory  5       1       1
devices 6       1       1
freezer 7       1       1
net_cls 8       1       1
perf_event      9       1       1
net_prio        10      1       1
hugetlb 11      1       1
pids    12      1       1
rdma    13      1       1
~$ grep CGROUP /boot/config-*
grep: /boot/config-*: No such file or directory

In the meantime, I found a workaround - probably an ugly one. Before launching the fluent bit agent in a wsl2 shell, I do the following:

# hack-start
sudo umount /sys/fs/cgroup/cpu
sudo mount -t cgroup -ocpuacct none /sys/fs/cgroup/cpu
# hack-end

With this hack, the agent is able to rune fine - but to be honest, I don't know the consequences of this hack.

Best regards,

Jean-Pierre

nokute78 commented 2 years ago

@nitrogene Thank you for logs.

I also think it is mount issue. Docker releases checking config script. https://github.com/moby/moby/blob/master/contrib/check-config.sh

It links to https://github.com/tianon/cgroupfs-mount https://github.com/moby/moby/blob/master/contrib/check-config.sh#L194

How about running check-config.sh ?

Note: I sent a patch #5189 . It is to prevent SIGSEGV not for mounting issue. It will not fix this issue since in_docker can't gather metrics even if the patch is merged.

nitrogene commented 2 years ago

Hello,

Here's the output of check_config.sh:

$ ./check_config.sh
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_IP_NF_FILTER: enabled
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled
- CONFIG_NETFILTER_XT_MARK: enabled
- CONFIG_IP_NF_NAT: enabled
- CONFIG_NF_NAT: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_CGROUP_BPF: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_SECCOMP_FILTER: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_LEGACY_VSYSCALL_NONE: enabled
    (containers using eglibc <= 2.13 will not work. Switch to
     "CONFIG_VSYSCALL_[NATIVE|EMULATE]" or use "vsyscall=[native|emulate]"
     on kernel command line. Note that this will disable ASLR for the,
     VDSO which may assist in exploiting security vulnerabilities.)
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled
- CONFIG_IP_VS: enabled
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled
- CONFIG_SECURITY_SELINUX: missing
- CONFIG_SECURITY_APPARMOR: missing
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled
    - CONFIG_BRIDGE_VLAN_FILTERING: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: enabled
  - "macvlan":
    - CONFIG_MACVLAN: enabled
    - CONFIG_DUMMY: enabled
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled
    - CONFIG_NF_CONNTRACK_FTP: enabled
    - CONFIG_NF_NAT_TFTP: enabled
    - CONFIG_NF_CONNTRACK_TFTP: enabled
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: missing
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Best regards,

Jean-Pierre

nokute78 commented 2 years ago

Did you run config_check.sh before executing below workaround ? https://github.com/fluent/fluent-bit/issues/5149#issuecomment-1076027755

nitrogene commented 2 years ago

Hello,

I have some doubt, so I ran check-config.sh again, in a fresh wsl2 shell, just after having restarted my computer:

$ ./check-config.sh
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_IP_NF_FILTER: enabled
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled
- CONFIG_NETFILTER_XT_MARK: enabled
- CONFIG_IP_NF_NAT: enabled
- CONFIG_NF_NAT: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_CGROUP_BPF: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_SECCOMP_FILTER: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_LEGACY_VSYSCALL_NONE: enabled
    (containers using eglibc <= 2.13 will not work. Switch to
     "CONFIG_VSYSCALL_[NATIVE|EMULATE]" or use "vsyscall=[native|emulate]"
     on kernel command line. Note that this will disable ASLR for the,
     VDSO which may assist in exploiting security vulnerabilities.)
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled
- CONFIG_IP_VS: enabled
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled
- CONFIG_SECURITY_SELINUX: missing
- CONFIG_SECURITY_APPARMOR: missing
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled
    - CONFIG_BRIDGE_VLAN_FILTERING: enabled
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: enabled
  - "macvlan":
    - CONFIG_MACVLAN: enabled
    - CONFIG_DUMMY: enabled
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled
    - CONFIG_NF_CONNTRACK_FTP: enabled
    - CONFIG_NF_NAT_TFTP: enabled
    - CONFIG_NF_CONNTRACK_TFTP: enabled
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: missing
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Regards,

Jean-Pierre

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stalled for 5 days with no activity.