Closed Maximebb closed 11 months ago
Maybe to add as context, I considered moving to the modern-ebpf
driver instead, but it fails at the driver loader looking to download the pre-built module at https://download.falco.org/driver/6.0.1+driver/x86_64/falco_cos_5.15.120+_1.ko (404 not found
)
One last piece of info: I downgraded a lab environment to test out the previous version. I noticed the upgrade didn't change the OS version, but it did bump up the kernel version from 5.15.107
to 5.15.120
.
Previous working nodes
~ $ uname -a
Linux <...> 5.15.107+ #1 SMP Thu Jun 29 09:19:06 UTC 2023 x86_64 AMD EPYC 7B12 AuthenticAMD GNU/Linux
~ $
~ $
~ $ cat /etc/os-release
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=b15e582c1dbbf0e6f06747082754e5c5a71ea426
GOOGLE_CRASH_ID=Lakitu
VERSION=101
VERSION_ID=101
BUILD_ID=17162.210.48
Upgraded non-functional nodes
~ $ uname -a
Linux <...> 5.15.120+ #1 SMP Sat Aug 19 09:23:05 UTC 2023 x86_64 AMD EPYC 7B13 AuthenticAMD GNU/Linux
~ $
~ $
~ $ cat /etc/os-release
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
KERNEL_COMMIT_ID=ca9810d05350e5d91be95056f0e5a75dd8e727ac
GOOGLE_CRASH_ID=Lakitu
GOOGLE_METRICS_PRODUCT_ID=26
VERSION=101
VERSION_ID=101
BUILD_ID=17162.279.24
uhm maybe this issue could help https://github.com/falcosecurity/falco/issues/2874. I see that the verifier error is the same
`-- BEGIN PROG LOAD LOG --
processed 43798 insns (limit 1000000) max_states_per_insn 1 total_states 4061 peak_states 4061 mark_read 1921
-- END PROG LOAD LOG --
Mon Oct 16 09:06:37 2023: An error occurred in an event source, forcing termination...
Mon Oct 16 09:06:37 2023: Closing event source 'syscall'
Events detected: 0
Rule counts by severity:
Triggered rules by rule name:
Error: libscap: bpf_load_program() event=raw_tracepoint/filler/sys_procexit_e: Operation not permitted`
The modern probe should work out of the box as reported here https://github.com/falcosecurity/falco/issues/2874#issuecomment-1771234766
Maybe to add as context, I considered moving to the modern-ebpf driver instead, but it fails at the driver loader looking to download the pre-built module at https://download.falco.org/driver/6.0.1+driver/x86_64/falco_cos_5.15.120+_1.ko (404 not found)
This is strange the modern probe doesn't use the driver-loader
... these are the changes required to run the modern-bpf https://github.com/falcosecurity/falco/issues/2874#issuecomment-1771234766
I feel bad, I definitely entered the modern-bpf
with a typo (extra e
). It works with the modern driver perfectly. I suspect the behavior with a typo was to default to the kernel module.
I'm actually unblocked on my end, but I'll let you decide whether to keep the issue opened to track the legacy epbf driver issue. I did a light reading on kernel release notes and 5.15.111 had a couple of ebpf related changes. I suspect that was the version that introduced a breaking change.
I feel bad, I definitely entered the modern-bpf with a typo (extra e). It works with the modern driver perfectly. I suspect the behavior with a typo was to default to the kernel module.
Yeah don't worry this is a common error, we are working on renaming it for the next release to improve the user experience!
I'm actually unblocked on my end, but I'll let you decide whether to keep the issue opened to track the legacy epbf driver issue. I did a light reading on kernel release notes and 5.15.111 had a couple of ebpf related changes. I suspect that was the version that introduced a breaking change.
Yes unfortunately this is a known issue we are aware of, having a probe compatible with all kernel versions is really hard, btw we will see if we can fix this. Since we are already tracking the verifier issue here https://github.com/falcosecurity/libs/issues/1521, i will close this one if it is ok for you! Feel free to reopen if you have other issues related to this
Describe the bug
We are running falco on GKE clusters, deployed through the helm chart. We've been running it successfully since last week, when all nodes were patched to the latest patch for 1.25 (1.25.13-gke.200). Since then, all pods are failing due to a permission issue:
How to reproduce it
Expected behaviour
We expected falco to be able to run in a privileged context. We confirmed the proc inside the container has the expected capabilities documented here.
Environment
GKE 1.25.13-gke.200
Troubleshooting What we've done, since this is supposed to be in a privileged context, is check the proc capabilities in case some were not available to kubernetes, somehow.
It seems to be granting ample permissions, since this decodes to
I can spot
cap_sys_ptrace
,cap_sys_resource
,cap_bpf
andcap_perfmon