Closed Junia18 closed 1 month ago
ei @Junia18 thank you for reporting! I see that in Falco 0.36.2 you have the k8s metadata enrichment enabled. This unfortunately could cause huge drops... not sure which deployment method you are using, but the k8s enrichment could be disabled by removing the -k
/-K
flags if you are running falco binary manually, or with the following command if you are using the default falco helm chart
helm install falco falcosecurity/falco --set collectors.kubernetes.enabled=false
Could you disable it and retry please?
@Andreagit97 I have removed the -k
and -K
flags, but the -pk
flag is still needed as we are consuming Kubernetes metadata. Despite these changes, CPU utilization remains high. We are running the Falco binary using the following command:
falco -K /var/run/secrets/kubernetes.io/serviceaccount/token -k https://$(KUBERNETES_SERVICE_HOST) --cri /var/run/crio/crio.sock -pk -v
Is there any other solution I can try?
Hi @Junia18, could you try the new k8s-metacollector
and k8smeta
plugin? We are going to ship them with falco 0.37 but you can test them with falco 0.36.2. Here you can find the modified chart: https://github.com/falcosecurity/charts/pull/598
For more info please see the following issue: https://github.com/falcosecurity/falco/issues/2973
Let me know if you need any help!
@alacuku I was wondering if there's any information available about the anticipated release date for Falco v0.37.0. I appreciate your time and assistance.
The expected release date for Falco 0.37 is the end of January 2024.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
@Andreagit97 Despite upgrading to Falco v0.37.1, we continue to experience high CPU usage, accompanied by frequent syscall event drops in the logs.
CPU usage after upgrading Falco to v0.37.1
➜ oc adm top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master0.brussels.cp.fyre.ibm.com 1458m 19% 11325Mi 78%
master1.brussels.cp.fyre.ibm.com 1123m 14% 11305Mi 78%
master2.brussels.cp.fyre.ibm.com 1851m 24% 9500Mi 65%
worker0.brussels.cp.fyre.ibm.com 6779m 43% 15696Mi 51%
worker1.brussels.cp.fyre.ibm.com 5537m 35% 17222Mi 56%
worker2.brussels.cp.fyre.ibm.com 4758m 30% 14445Mi 47%
Logs
Tue Apr 16 14:51:20 2024: Falco version: 0.37.1 (x86_64)
Tue Apr 16 14:51:20 2024: CLI args: /usr/bin/falco -c /config/falco_config.yaml --cri /var/run/crio/crio.sock -pk -v
Tue Apr 16 14:51:20 2024: Falco initialized with configuration file: /config/falco_config.yaml
Tue Apr 16 14:51:20 2024: System info: Linux version 5.14.0-284.55.1.el9_2.x86_64 (mockbuild@x86-64-02.build.eng.rdu2.redhat.com) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) #1 SMP PREEMPT_DYNAMIC Mon Feb 19 16:57:59 EST 2024
Tue Apr 16 14:51:20 2024: Configured rules filenames:
Tue Apr 16 14:51:20 2024: /config/falco_rules.yaml
Tue Apr 16 14:51:20 2024: Loading rules from file /config/falco_rules.yaml
Tue Apr 16 14:51:20 2024: Watching file '/config/falco_config.yaml'
Tue Apr 16 14:51:20 2024: Watching file '/config/falco_rules.yaml'
Tue Apr 16 14:51:20 2024: (12) syscalls in rules: execve, kill, link, linkat, mkdir, mkdirat, open, openat, rename, rmdir, unlink, unlinkat
Tue Apr 16 14:51:20 2024: +(58) syscalls (Falco's state engine set of syscalls): accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, connect, creat, dup, dup2, dup3, epoll_create, epoll_create1, eventfd, eventfd2, execveat, fchdir, fcntl, fork, getsockopt, inotify_init, inotify_init1, io_uring_setup, memfd_create, mount, open_by_handle_at, openat2, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, recvfrom, recvmsg, sendmsg, sendto, setgid, setpgid, setresgid, setresuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socket, socketpair, timerfd_create, umount, umount2, userfaultfd, vfork
Tue Apr 16 14:51:20 2024: (70) syscalls selected in total (final set): accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, connect, creat, dup, dup2, dup3, epoll_create, epoll_create1, eventfd, eventfd2, execve, execveat, fchdir, fcntl, fork, getsockopt, inotify_init, inotify_init1, io_uring_setup, kill, link, linkat, memfd_create, mkdir, mkdirat, mount, open, open_by_handle_at, openat, openat2, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, recvfrom, recvmsg, rename, rmdir, sendmsg, sendto, setgid, setpgid, setresgid, setresuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socket, socketpair, timerfd_create, umount, umount2, unlink, unlinkat, userfaultfd, vfork
Tue Apr 16 14:51:20 2024: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Tue Apr 16 14:51:20 2024: Loaded event sources: syscall
Tue Apr 16 14:51:20 2024: Enabled event sources: syscall
Tue Apr 16 14:51:20 2024: Opening event source 'syscall'
Tue Apr 16 14:51:20 2024: Opening 'syscall' source with Kernel module
Tue Apr 16 14:54:01 2024: Syscall event drop but token bucket depleted, skipping actions
Tue Apr 16 14:54:04 2024: Syscall event drop but token bucket depleted, skipping actions
Tue Apr 16 14:54:06 2024: Syscall event drop but token bucket depleted, skipping actions
Tue Apr 16 14:54:07 2024: Syscall event drop but token bucket depleted, skipping actions
Tue Apr 16 14:54:08 2024: Syscall event drop but token bucket depleted, skipping actions
Tue Apr 16 14:55:02 2024: Syscall event drop but token bucket depleted, skipping actions
Running the Falco binary using the following command:
falco", "-c", "/config/falco_config.yaml", "--cri", "/var/run/docker.sock", "-pk", "-v"
New Environment:
Falco version: 0.37.1
System info: "machine": "x86_64", "release": "5.14.0-284.55.1.el9_2.x86_64", "sysname": "Linux", "version": "#1 SMP PREEMPT_DYNAMIC Mon Feb 19 16:57:59 EST 2024"
Cloud provider or hardware configuration:
OS: NAME="Red Hat Enterprise Linux" VERSION="8.9 (Ootpa)" ID="rhel" ID_LIKE="fedora" VERSION_ID="8.9" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux 8.9 (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
Kernel: Linux 5.14.0-284.55.1.el9_2.x86_64
@Andreagit97 Just a gentle reminder that I am waiting for your input on this issue for over two weeks now. Any updates or insights you can provide would be greatly appreciated. Thanks!
Hi 👋 @Junia18 happy to also help since Andrea is constantly working on many areas. Between v0.28 and now a lot changed.
Recommending to follow our Troubleshooting guides https://falco.org/docs/troubleshooting/dropping/ and explore the base_syscalls
option and other optimizations.
In your original comment I see that you are showing metrics from different clusters https://github.com/falcosecurity/falco/issues/2996#issue-2069963392 -- unfortunately it's hard to make truly fair comparisons and looking at the CPU usages it does not appear to be a huge delta. Perhaps also checkout this paragraph https://falco.org/docs/metrics/performance/#server-load-and-falco-event-drops stating that Falco's usage is never constant. In case you are interested in a more fair comparison: would you consider running the different Falco versions on the exact same nodes plus also export our internal metrics (https://falco.org/docs/metrics/falco-metrics/) just so we can check if event rates etc were similar.
Please let us know if the guides are helpful.
^ small correction: https://github.com/falcosecurity/falco/issues/2996#issuecomment-2096769592 the internal metrics didn't yet exist in v0.28 so we can't make the more detailed comparisons.
ei @Junia18 as suggested by @incertum i would use the base_syscal.repair
l option.
Looking at your actual situation, you are tracing 70 syscalls
Tue Apr 16 14:51:20 2024: (12) syscalls in rules: execve, kill, link, linkat, mkdir, mkdirat, open, openat, rename, rmdir, unlink, unlinkat
Tue Apr 16 14:51:20 2024: +(58) syscalls (Falco's state engine set of syscalls): accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, connect, creat, dup, dup2, dup3, epoll_create, epoll_create1, eventfd, eventfd2, execveat, fchdir, fcntl, fork, getsockopt, inotify_init, inotify_init1, io_uring_setup, memfd_create, mount, open_by_handle_at, openat2, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, recvfrom, recvmsg, sendmsg, sendto, setgid, setpgid, setresgid, setresuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socket, socketpair, timerfd_create, umount, umount2, userfaultfd, vfork
Tue Apr 16 14:51:20 2024: (70) syscalls selected in total (final set): accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, connect, creat, dup, dup2, dup3, epoll_create, epoll_create1, eventfd, eventfd2, execve, execveat, fchdir, fcntl, fork, getsockopt, inotify_init, inotify_init1, io_uring_setup, kill, link, linkat, memfd_create, mkdir, mkdirat, mount, open, open_by_handle_at, openat, openat2, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, recvfrom, recvmsg, rename, rmdir, sendmsg, sendto, setgid, setpgid, setresgid, setresuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socket, socketpair, timerfd_create, umount, umount2, unlink, unlinkat, userfaultfd, vfork
With base_syscall.repait=true
you should obtain something like
Tue May 7 11:12:06 2024: (12) syscalls in rules: execve, kill, link, linkat, mkdir, mkdirat, open, openat, rename, rmdir, unlink, unlinkat
Tue May 7 11:12:06 2024: +(18) syscalls (Falco's state engine set of syscalls): capset, chdir, chroot, clone, clone3, close, execveat, fchdir, fork, prctl, procexit, setgid, setpgid, setresgid, setresuid, setsid, setuid, vfork
Tue May 7 11:12:06 2024: (30) syscalls selected in total (final set): capset, chdir, chroot, clone, clone3, close, execve, execveat, fchdir, fork, kill, link, linkat, mkdir, mkdirat, open, openat, prctl, procexit, rename, rmdir, setgid, setpgid, setresgid, setresuid, setsid, setuid, unlink, unlinkat, vfork
It's enough to provide Falco with -o base_syscalls.repair=true
or changing it directly in the config. This should improve the drop situation.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle rotten
We experience the same issue on a number of Ubuntu 22.04 VMs using falco 0.37.1.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Provide feedback via https://github.com/falcosecurity/community. /close
@poiana: Closing this issue.
I am encountering a critical issue while running Falco v0.36.2 as a daemonset on an Openshift cluster (OCP v4.12) with three worker nodes. The problem manifests as a substantial drop rate of events and a notable doubling of CPU usage across all three worker nodes.
This behavior is particularly concerning as it impacts the effective functioning of Falco and the overall performance of the cluster. The kernel probe is being utilized in this configuration.
Steps to Reproduce:
CPU usage before (when using Falco versions upto v0.26.2)
CPU usage after (when using Falco versions > v0.27.0)
Logs
Environment
Falco version: 0.36.2
System info: "machine": "x86_64", "release": "4.18.0-372.73.1.el8_6.x86_64", "sysname": "Linux", "version": "#1 SMP Fri Sep 8 13:16:27 EDT 2023"
Cloud provider or hardware configuration:
OS: NAME="Red Hat Enterprise Linux" VERSION="8.8 (Ootpa)" ID="rhel" ID_LIKE="fedora" VERSION_ID="8.8" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux 8.8 (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
Kernel: Linux 4.18.0-372.73.1.el8_6.x86_64 #1 SMP Fri Sep 8 13:16:27 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
Installation method: from source