falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
227 stars 162 forks source link

BPF issue on amazon linux 2 since we upgraded from 0.29.1 to 0.30 (not working on all kernel 4.1X and 5.X and using clang 7 or clang 11) #130

Closed JoupainMD closed 2 years ago

JoupainMD commented 2 years ago

Describe the bug We are encountering issue with the BPF module since we upgrade from falco 0.29.1 to 0.30. We are building the bpf probe using our own docker image (as an init container), we have been using the default clang llvm version for long (11) and we had to switch to clang7 since 0.29 if my memory is correct. But now, it does not seems to work with both clang version We are getting some stacktrace like this

math between map_value pointer and register with unbounded min value is not allowed
2021-11-15T15:22:14+0000: Runtime error: bpf_load_program() err=22 event=filler/sys_read_x message=0: (bf) r6 = r1

How to reproduce it Using EKS v1.18 with amazon linux 2 and clang 7 or clang 11 (latest from amazon repo)

Environment OS: amazon linux 2 (kernel : 4.14.219-161.340.amzn2.x86_64 but we also use 5.X kernel and the issue is the same) Using EKS (AWS kubernetes) 1.18 Clang + LLVM : 7 and 11 (from amazon package repo)

Additional context We also tried to use the latest version of this repository, especially following this issue

FedeDP commented 2 years ago

Hi! Thanks for opening this bug report.

Note that

math between map_value pointer and register with unbounded min value is not allowed 2021-11-15T15:22:14+0000: Runtime error: bpf_load_program() err=22 event=filler/sys_read_x message=0: (bf) r6 = r1

is the kernel verifier that is rejecting eBPF probe bytecode! By the way, we should support clang 7 up to 14 as you can see here: https://github.com/falcosecurity/libs/pull/81. As you can see in the support matrix, clang11 passed all our tests.

Moreover, i recently pushed some fixes to support down to clang 5: https://github.com/falcosecurity/libs/pull/109.

All of this just to say (scream): "that's weird!" :D I will try to blindly provide a patch for you: bpf.txt to understand which part of the function is causing the error. Unfortunately unless i am able to reproduce it, i will need your help to understand where is the problem and try to fix it!

JoupainMD commented 2 years ago

The error seems to be the same, just to be sure with you, here is how I build the bpf module. In my falco-ebpf-builder init container I am fetching falco from GitHub archive (using curl). Then I am running this command (from falco repo root directory):

cmake \
        -DCMAKE_BUILD_TYPE="release" \
        -DBUILD_DRIVER=OFF \
        -DFALCOSECURITY_LIBS_VERSION="1ed3e2a15dad1347459f1d55838bbbb8ae352266" \
        -DFALCOSECURITY_LIBS_CHECKSUM="SHA256=14801610411317af51bd636cd0ae5800c056e5dd52ef013ed22c28c3bad0168a" \
        -DBUILD_BPF=ON \
        -DBUILD_WARNINGS_AS_ERRORS=ON \
        -DFALCO_VERSION="${FALCO_VERSION}" \
        -DDRAIOS_DEBUG_FLAGS="" \
        -DUSE_BUNDLED_DEPS=ON \
        . 

then I am applying the patch by replacing the file located at:

and then I make bpf from /falco-repo/build

Looks right to you (according to my understanding of the MakeFiles it seems correct).

FedeDP commented 2 years ago

Yes, it looks right! Thanks!

The error seems to be the same

Are you sure it is still about

err=22 event=filler/sys_read_x

? I'd expect sys_write_x!

JoupainMD commented 2 years ago

Ah sorry it is sys_send_x, here is the entire message:


math between map_value pointer and register with unbounded min value is not allowed
2021-11-16T09:20:33+0000: Runtime error: bpf_load_program() err=22 event=filler/sys_send_x message=0: (bf) r6 = r1```
FedeDP commented 2 years ago

Nice! So the good thing is we found the guilty function. Bad thing is that now i will have to provide "pseudo random" patches to try to fix it :) I will come back with some patches to test! Thank you!

FedeDP commented 2 years ago

A couple of PRs:

Let me know if any of these works (ie: if they pass the verifier; they disable a feature indeed, but this is needed to better locate the issue!)

EDIT: please apply any patch that i send you starting with a clean libs (ie: from master)! Thanks!

JoupainMD commented 2 years ago

disable bpf_probe_read output :

falco 2021-11-16T16:28:45+0000: Falco version 0.30.0 (driver version 3aa7a83bf7b9e6229a3824e3fd1f4452d1e95cb4)                                                                                                                                 
falco 2021-11-16T16:28:45+0000: Falco initialized with configuration file /etc/falco/falco.yaml                                                                                                                                                
falco 2021-11-16T16:28:45+0000: Loading rules from file /etc/falco/falco_rules.yaml:                                                                                                                                                           
falco 2021-11-16T16:28:45+0000: Loading rules from file /etc/falco/falco_rules.local.yaml:                                                                                                                                                     
falco 2021-11-16T16:28:46+0000: Loading rules from file /etc/falco/k8s_audit_rules.yaml:                                                                                                                                                      
falco 2021-11-16T16:28:46+0000: Unable to load the driver.                                                                                                                                                                                     
falco 2021-11-16T16:28:46+0000: Runtime error: invalid filler name: sys_openat2_x. Exiting.

Same output for disable compute_snaplen

FedeDP commented 2 years ago

Mmh did you reverted to master before applying the patches? I mean, i expected one of them to fail on the kernel verifier :D

Runtime error: invalid filler name: sys_openat2_x. Exiting.

This error is weird though, but let's first focus on the kernel verifier (this error can be caused by the use of new bpf probe against an old falco version, i guess. 'Old' here means a falco version that originally had another libs version).

JoupainMD commented 2 years ago

I did revert from master the file filler_helpers.h. But for the libs we are using commit version 1ed3e2a15dad1347459f1d55838bbbb8ae352266 as mentioned in the cmake command ahead. I will check again to be sure I did not miss anything or run the wrong container version.

About potential old bpf probe version asked by falco this is possible, we are using falco 0.30 release (not master) but libs currently we fetch+build from commit 1ed3e2a15dad1347459f1d55838bbbb8ae352266, to bypass probe version check, in our dockerfile we do override the PROBE_VERSION variable from /falco-repo/cmake/modules/falcosecurity-libs.cmake in order to match the version that falco 0.30 is wanting (in this case this is commit 3aa7a83bf7b9e6229a3824e3fd1f4452d1e95cb4) Is it really crappy ? If so I can try to build falco from master also but we really plan to use falco from 0.30 and not build it from source on production.

FedeDP commented 2 years ago

Once the kernel verifier issue is fixed, we will try to understand the error you are getting about the invalid filler. It should not be something to worry about though!

Edit: btw thanks for your time!

FedeDP commented 2 years ago

But for the libs we are using commit version 1ed3e2a as mentioned in the cmake command ahead.

I guess that if you use an older version of libs the

falco 2021-11-16T16:28:46+0000: Runtime error: invalid filler name: sys_openat2_x. Exiting.

message will disappear :)

leogr commented 2 years ago

We are encountering issue with the BPF module since we upgrade from falco 0.29.1 to 0.30. We are building the bpf probe using our own docker image (as an init container), we have been using the default clang llvm version for long (11) and we had to switch to clang7 since 0.29 if my memory is correct.

Hey @JoupainMD

Could you provide us the Dockerfile (or the docker image) you have used as init container? I think this is the only way for us to exactly reproduce your issue so we can then debug it. Currently, I haven't been able to reproduce it.

Thanks in advance! :pray:

JoupainMD commented 2 years ago

Hello,

the Dockerfile

The entrypoint.sh

patch_falco I use to change it with the latest patch you provide. One thing to note is that we use docker.io/library/amazonlinux:2.0.20211005.0 to be able to access the kernel header from amazon repo. Maybe we can use a multi stage and only provide the kernel header to a Debian for example (and benefits from clang 12). Would it be better in your opinion ?

FedeDP commented 2 years ago

Hi @JoupainMD good news: i was able to reproduce your issue! I am currently testing a fix :crossed_fingers:

FedeDP commented 2 years ago

I opened a PR; can you test? @JoupainMD https://github.com/falcosecurity/libs/pull/140

Thanks!

(obviously it did fix the issue for me!)

JoupainMD commented 2 years ago

Hello @FedeDP I tested it on our environment but unfortunately I am still getting the same error Runtime error: invalid filler name: sys_openat2_x. Exiting.

here is the exact version of ami we use : ami-02e17a76e494a9e99 kernel version: 4.14.219-161.340.amzn2.x86_64

FedeDP commented 2 years ago

Runtime error: invalid filler name: sys_openat2_x. Exiting.

This is an error coming from libsinsp, ie: it has nothing to do with kernel eBPF verifier (that is now satisfied!).
The issue is you are using falco v0.30 with new libs; you can try to backport my fix to libs shipped with falco v0.30; it should do the trick!

JoupainMD commented 2 years ago

Ok that's clear I'll try that asap I'll keep you posted šŸ™

JoupainMD commented 2 years ago

WORKING on 4.14.219-161.340.amzn2.x86_64 šŸ‘ šŸ’Æ Thanks @FedeDP and @leogr I will now try on our 5.X kernels as well šŸ™

JoupainMD commented 2 years ago

Working on 5.4.149-73.259.amzn2.x86_64 as well. Perfect, thank you again guys šŸ™ Ok for me to close this issue whenever you want.

FedeDP commented 2 years ago

Top! The issue will be automatically closed once the PR is merged ;)

Thanks for your time!

JoupainMD commented 2 years ago

I noticed some warnings on 5.4.149-73.259.amzn2.x86_64 (I am not sure it was present or not before your patch). see here. Anyway it's working so not really important.

FedeDP commented 2 years ago

I think you can safely ignore them. Are you building with clang11 or clang7?

Btw if you could double check that they were present before my patch too, it would be great :) (i am 100% sure they were though, but a double check is worth the time!)

JoupainMD commented 2 years ago

Yep you are right, already here in 0.29.1, we didn't notice (only on 5.X kernels it looks like).