falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
212 stars 159 forks source link

Serious eBPF driver regression starting with 0.16.x libs #1890

Closed incertum closed 4 weeks ago

incertum commented 1 month ago

When I last forked off usptream (libs 0.15.x) the eBPF driver (legacy and modern_ebpf) continued to work wonderfully as they did did for almost 1.5 years.

However, now there seem to be serious issues and regressions for older kernels, but I have also observed issues with modern_ebpf depending on the compiler version used.

I tried clang 12, 14, 16, 18 with some updated builder containers. While the eBPF probes compiled I have either observed eBPF verifier issues (one time for sys_poll, another time for sys_preadv). I also observed scap errors during initialization or map type not allowed errors ... in summary, it does not appear to be very clear what the issue is, especially because there were a lot of changes.

incertum commented 1 month ago

Initially posted eBPF compiler issues for the VM test suite, however it was caused by commit https://github.com/falcosecurity/libs/commit/209243e020732697ac7323be0075b2598bd2879b

I can now run the test suite against the past releases for comparison and post here later.

incertum commented 1 month ago

Ran the VM test suites against past 2 release branches. I observed few consistent verifier issues we might be able to address for the following tail calls:


release/0.16.x

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled]

kernel_uname_r clang-7 clang-12 clang-14 clang-16 gcc-5 gcc-9 gcc-11 gcc-13
3.10.0-1160.49.1.el7.x86_64 🔵
4.14.296-222.539.amzn2.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
4.16.18-041618-generic 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
4.19.296-0419296-generic 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.4.247-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.10.9-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.14.15-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.19.17-051917-generic 🔵 🔵 🔵 🔵
6.5.0-060500-generic 🔵
6.5.8-1.el7.elrepo.x86_64 🔵 🔵 🔵

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled + success]

kernel_uname_r clang-7 clang-12 clang-14 clang-16 gcc-5 gcc-9 gcc-11 gcc-13
4.14.296-222.539.amzn2.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
4.16.18-041618-generic 🟢 🟢 🟢 🟢 🟢 🟢
4.19.296-0419296-generic 🟢 🟢 🟢
5.4.247-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢 🟢
5.10.9-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢 🟢
5.14.15-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢 🟢
5.19.17-051917-generic 🟢 🟢 🟢 🟢
6.5.0-060500-generic 🟢
6.5.8-1.el7.elrepo.x86_64 🟢 🟢 🟢

libscap: bpf_load_program() event=raw_tracepoint/filler/sys_recvfrom_x: Operation not permitted (1) [STATUS] FAILED /libs/test/vm/build/driver/clang-14/4.19.296-0419296-generic

release/0.17.x

clang-12 nothing ran anymore, manually added the red crosses below

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled]

kernel_uname_r clang-7 clang-12 clang-14 clang-16 gcc-5 gcc-9 gcc-11 gcc-13
3.10.0-1160.49.1.el7.x86_64 🔵
4.14.296-222.539.amzn2.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
4.16.18-041618-generic 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
4.19.296-0419296-generic 🔵 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.4.247-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.10.9-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.14.15-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵 🔵 🔵
5.19.17-051917-generic 🔵 🔵 🔵 🔵
6.5.0-060500-generic 🔵
6.5.8-1.el7.elrepo.x86_64 🔵 🔵 🔵

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled + success]

kernel_uname_r clang-7 clang-12 clang-14 clang-16 gcc-5 gcc-9 gcc-11 gcc-13
4.14.296-222.539.amzn2.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
4.16.18-041618-generic 🟢 🟢 🟢 🟢 🟢 🟢 🟢
4.19.296-0419296-generic 🟢 🟢 🟢
5.4.247-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
5.10.9-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
5.14.15-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢 🟢
5.19.17-051917-generic 🟢 🟢 🟢
6.5.0-060500-generic 🟢
6.5.8-1.el7.elrepo.x86_64 🟢 🟢 🟢

libscap: bpf_load_program() event=raw_tracepoint/filler/sys_readv_preadv_x: Operation not permitted (1)

[STATUS] FAILED /libs/test/vm/build/driver/clang-12/5.14.15-1.el7.elrepo.x86_64 [STATUS] DONE 5.14.15-1.el7.elrepo.x86_64

libscap: bpf_load_program() event=raw_tracepoint/filler/sys_readv_preadv_x: Operation not permitted (1)

[STATUS] FAILED /libs/test/vm/build/driver/clang-12/5.19.17-051917-generic [STATUS] DONE 5.19.17-051917-generic

libscap: bpf_load_program() event=raw_tracepoint/filler/sys_readv_preadv_x: Operation not permitted (1) [STATUS] FAILED /libs/test/vm/build/driver/clang-16/4.19.296-0419296-generic [STATUS] DONE 4.19.296-0419296-generic

incertum commented 1 month ago

Re the scap errors I observed, I have to look more into it next week when I will more consistently go through test conditions. I had never encountered these type of scap errors related to the driver loading before.

leogr commented 1 month ago

Hey @incertum

Have you tried with https://github.com/falcosecurity/libs/releases/tag/7.2.0%2Bdriver and https://github.com/falcosecurity/libs/releases/tag/0.17.1?

incertum commented 1 month ago

@leogr yes I was on branch release/0.17.x, see https://github.com/falcosecurity/libs/issues/1890#issuecomment-2141188455

incertum commented 1 month ago

@FedeDP here is the output for your PR. I just ran it for clang, primary issues seem fixed now, especially for clang-12.

I know you are still working on the 4.14 kernels improvements.

[Note that the output matrix auto-adjusts and for the 6.x kernels I need to add new / better builder containers as they just don't compile right now, so ignore that -> only focus on blue dot turning green or not]

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled]

kernel_uname_r clang-7 clang-12 clang-14 clang-15 clang-16
4.14.296-222.539.amzn2.x86_64 🔵 🔵 🔵 🔵 🔵
4.16.18-041618-generic 🔵 🔵 🔵 🔵 🔵
4.19.296-0419296-generic 🔵 🔵 🔵 🔵 🔵
5.4.247-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵
5.10.9-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵
5.14.15-1.el7.elrepo.x86_64 🔵 🔵 🔵 🔵 🔵
5.19.17-051917-generic 🔵 🔵 🔵 🔵

Driver (clang -> bpf, gcc -> kmod) kernel compatibility matrix [compiled + success]

kernel_uname_r clang-7 clang-12 clang-14 clang-15 clang-16
4.14.296-222.539.amzn2.x86_64 🟢 🟢 🟢 🟢
4.16.18-041618-generic 🟢 🟢 🟢 🟢 🟢
4.19.296-0419296-generic 🟢 🟢 🟢 🟢 🟢
5.4.247-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢
5.10.9-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢
5.14.15-1.el7.elrepo.x86_64 🟢 🟢 🟢 🟢 🟢
5.19.17-051917-generic 🟢 🟢 🟢 🟢
incertum commented 1 month ago

@FedeDP I tried updating the builder containers to check on the 6.5.0-060500 ubuntu test kernels, but I still get legitimate compile erros, not the fault of the builder container.

Edit: Exact same issues when compiling the eBPF probe for 6.5.8-1.el7.elrepo.x86_64 ...

Btw my IDE also highlights issues with that line (struct mm_struct *mm expression must have struct or union type but it has type "struct percpu_counter *"C/C++(154)).

/libs/build/driver/bpf/src/fillers.h:923:56: error: member reference base type 'struct percpu_counter[4]' is not a structure or union
        bpf_probe_read_kernel(&val, sizeof(val), &mm->rss_stat.count[member]);
                                                  ~~~~~~~~~~~~^~~~~~
/libs/build/driver/bpf/src/fillers.h:2447:48: warning: passing 'volatile long *' to parameter of type 'long *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
                res = bpf_accumulate_argv_or_env(data, argv, &args_len);
                                                             ^~~~~~~~~
/libs/build/driver/bpf/src/fillers.h:1985:19: note: passing argument to parameter 'args_len' here
                                                      long *args_len)
                                                            ^
1 warning and 1 error generated.
make[6]: *** [/libs/build/driver/bpf/src/Makefile:74: /libs/build/driver/bpf/src/probe.o] Error 1
make[5]: *** [/headers/6.5.0-060500-generic/usr/src/linux-headers-6.5.0-060500/Makefile:2038: /libs/build/driver/bpf/src] Error 2
make[4]: *** [Makefile:234: __sub-make] Error 2
make[4]: Leaving directory '/headers/6.5.0-060500-generic/usr/src/linux-headers-6.5.0-060500'
make[3]: *** [Makefile:23: all] Error 2
make[3]: Leaving directory '/libs/build/driver/bpf/src'
make[2]: *** [driver/bpf/CMakeFiles/bpf.dir/build.make:70: driver/bpf/CMakeFiles/bpf] Error 2
make[2]: Leaving directory '/libs/build'
make[1]: *** [CMakeFiles/Makefile2:646: driver/bpf/CMakeFiles/bpf.dir/all] Error 2
make[1]: Leaving directory '/libs/build'
make: Leaving directory '/libs/build/driver/bpf'
make: *** [Makefile:136: all] Error 2
incertum commented 1 month ago

Updated the test/vm setup https://github.com/falcosecurity/libs/pull/1897, right now on master, also given https://github.com/falcosecurity/libs/issues/1890#issuecomment-2141052122 updates oddly changed a bit the matrix wrt what compiled, and the issue outlined here https://github.com/falcosecurity/libs/issues/1890#issuecomment-2150769343 is now even more prevalent across multiple clang versions etc.

FedeDP commented 1 month ago

/libs/build/driver/bpf/src/fillers.h:923:56: error: member reference base type 'struct percpu_counter[4]' is not a structure or union bpf_probe_read_kernel(&val, sizeof(val), &mm->rss_stat.count[member]);

We have a bpf configure module for this: https://github.com/falcosecurity/libs/tree/master/driver/bpf/configure/RSS_STAT_ARRAY Ie: it should always be able to tell whether the rss array is present or not, and compile fine. Are you building against master? Perhaps your test suite is not running the bpf build through cmake?

FedeDP commented 1 month ago

I saw in test/vm/scripts/compile_drivers.sh that you are indeed configuring libs sources with cmake; that should work fine then! But you are using

make LLC=${LLC} CLANG=${CLANG} \ KERNELDIR=${SOURCES} -B -C "${LIBS_DIR}/build/driver/bpf" || true

${LIBS_DIR}/build/driver/bpf folder; locally, i need to use: ~/Work/libs/build/driver/bpf/src instead. Can it be the root cause? That folder was ok before we merged https://github.com/falcosecurity/libs/pull/1709

EDIT: i was wrong, it works fine from ~/Work/libs/build/driver/bpf fodler too, sorry for the noise.