falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
227 stars 162 forks source link

[FEATURE/BUG] Verifier issues with the current bpf probe #940

Closed Andreagit97 closed 7 months ago

Andreagit97 commented 1 year ago

Motivation

Here we go again! Recently we faced different verifier issues in the current bpf probe :( These are some examples:

I can see 2 main causes here:

IMHO we cannot withstand too many other changes before reaching the complexity limit... this is the reason why we crafted the modern bpf probe from scratch! BTW since almost all users are still strictly related to the current probe I think we need to find a temporary solution to this issue. One idea that could allow us to decrease the complexity a little bit could be to split the bpf_val_to_ring helper into dedicated helper functions. I've already started this work some time ago(https://github.com/falcosecurity/libs/pull/906) to address this kind of issue, but I think it could be time to extend this approach to all the codebase!

This refactor shouldn't be too huge and it should give us further time before the final :boom:

WDYT? @FedeDP @incertum @Molter73 @hbrueckner @leogr

Feature

Split the bpf_val_to_ring helper into dedicated helper functions like in the modern BPF probe.

Molter73 commented 1 year ago

Honestly, looking at #906 it doesn't look like a solution, it looks like code duplication hell.

The proposed solution of going into a code re-write spree in the eBPF probe seems like something that will a take A LOT of resources and time, ultimately making the code even more complex, harder to maintain AND potentially adding yet more places where something could go wrong. If the problem is that changes to clang and the driver cause issues with the verifier, then IMO we need to test the probe on a larger spectrum of kernels and compile them with as many compiler versions as possible, fixing issues as they are caught.

But that's just my opinion, obviously.

FedeDP commented 1 year ago

One idea that could allow us to decrease the complexity a little bit could be to split the bpf_val_to_ring helper into dedicated helper functions

I fully agree with the proposed solution: bpf_val_to_ring is a super complex piece of code inlined inside each of our fillers, often multiple times. This would help us reduce code complexity (in the end i assume an API similar to the modern-bpf one will be reached) and having much less need of splitting code up in multiple tail-called fillers because a single bpf program gets too long. Performances should also improve a bit.

The proposed solution of going into a code re-write spree in the eBPF probe seems like something that will a take A LOT of resources and time,

@Molter73 it will, but we are in a situation where each small change could lead to breaks. Moreover, we would have similar APIs between the old and the modern probe, leading to a cleaner code.

If the problem is that changes to clang and the driver cause issues with the verifier, then IMO we need to test the probe on a larger spectrum of kernels and compile them with as many compiler versions as possible, fixing issues as they are caught.

This is super hard! First of all, each clang/linux kernel patch release could kill us; moreover, maintaining such a test matrix is a huge PITA; this is a 3-dimensional matrix: { clang version, kernel version, architecture }. If we were able to put this in place, it would be a huge win; i think this is actually much more work (and can be done in parallel of the bpf probe refactor).

Of course, #906 seems to contain code duplication; it is just because it is a first pass. Resulting API, as said earlier, should be much similar to the modern bpf one.

Andreagit97 commented 1 year ago

If the problem is that changes to clang and the driver cause issues with the verifier, then IMO we need to test the probe on a larger spectrum of kernels and compile them with as many compiler versions as possible, fixing issues as they are caught.

The problem is that the last time we fixed it with @FedeDP we loose one day finding a "reliable patch" (https://github.com/falcosecurity/libs/pull/858) :/ So the issue is not the clang version but that we are hitting the verifier complexity limit on many kernels... We put on a bunch of plasters but the dam is starting to leak :(

If we are able to enrich our testing matrix I would be super happy, but I think this is an orthogonal topic :/

BTW this is just a proposal if we are not on board, we can find something else maybe

incertum commented 1 year ago

Would love to see a balance. I get it that we have to find a better solution here, at the same time reciting @Molter73 it will drain resources from other tasks that in the grander scheme of things need to be prioritized more as well.

Currently some community feature requests are not even in any queue, because of ongoing refactors and limited engineering resources. Therefore, could we find a way to do it more slowly while not sacrificing prioritization of features that help making Falco more useful for end users in terms of "catching bad stuff"? Again, just trying to see if we can find a balance. For instance our refactor budget for Falco 0.35 has been already exhausted from my perspective, it would make sense to check on outstanding features we should add instead and then for the next libs and Falco release 0.36 yes we can mix in again refactors?

Same disclaimer as Mauro, it's obviously also just my opinion and observation.

Andreagit97 commented 1 year ago

Again, just trying to see if we can find a balance. For instance our refactor budget for Falco 0.35 has been already exhausted from my perspective, it would make sense to check on outstanding features we should add instead and then for the next libs and Falco release 0.36 yes we can mix in again refactors?

That's a good question to which I have no answer... I've seen these verifier issues in the past weeks :/ maybe if we don't touch the probe anymore we will have something stable for the release, but I'm not sure about that, IMHO we still have the socketcall management(https://github.com/falcosecurity/libs/pull/811) to merge for the next release. The only concern that I have is that if we find something bad near the release it would much more difficult to do this refactor because we don't have time to test it :/

FedeDP commented 1 year ago

Note moreover, that i think this is an effort that can be done in background, like 1 PR per week until we are over, perhaps porting away a single type from bpf_val_to_ring each iteration. I will be glad to join and help you @Andreagit97 !

incertum commented 1 year ago

@Andreagit97 and @FedeDP would be on board with the slow and steady 1 PR / week, sequenced by highest impact (if possible) plus not sacrificing on features we still have in the queue for next release and other minor improvements that aim to make the tool more user friendly and address current top pain points amongst adopters.

Andreagit97 commented 1 year ago

let's say if we don't find further breaks we can postpone this fix. For this release I expect other 2 changes in the probe:

They shouldn't be disruptive, let's see if the verifier doesn't complain we can postpone the fix to the next release, but I'm still convinced that at some point in time, we will need it :(

incertum commented 1 year ago

Hey @Andreagit97 got back to working on the localhost VM driver sanity tests, please checkout https://github.com/falcosecurity/libs/issues/982 - kernel compatibility indeed used to be better for some time.

While running tests today noticed the following issue with clang-15 on a 5.4 kernel and a bunch of the ubuntu kernels such as 4.19, 4.16 etc

math between map_value pointer and register with unbounded min value is not allowed
libscap: bpf_load_program() event=raw_tracepoint/filler/sys_unshare_e: Operation not permitted (1)

Can check over next days, but wanted to generally share that from my perspective verifier always complained when we forgot to initialize variables to 0 and or confusion with > or < aka it was always "math related" -> perhaps we should perform a simple audit in this respect?

Andreagit97 commented 1 year ago

Hey @incertum that's exactly why I opened this issue. The pain point is that these verifier logs could be misleading... Last time I got the same error math between map_value pointer and register with unbounded min value is not allowed in sys_empty and as you can notice there is not too much code :/

FILLER(sys_empty, true)
{
    return PPM_SUCCESS;
}

also this time in sys_unshare_e

FILLER(sys_unshare_e, true)
{
    unsigned long val;
    u32 flags;
    int res;

    val = bpf_syscall_get_argument(data, 0);
    flags = clone_flags_to_scap(val);
    res = bpf_val_to_ring(data, flags);
    return res;
}

If there is an issue, it is in the bpf_val_to_ring, that's why we want to split it! because even if we fix the val_to_ring for this filler we will probably break another filler (maybe on another kernel with another clang) since this function is shared between all our bpf progs :/

Andreagit97 commented 1 year ago

Seen the worrying graphs posted by @incertum here I would start to address it, step by step. WDYT @falcosecurity/libs-maintainers ?

FedeDP commented 1 year ago

I agree! I have a local branch with some changes, to split up bpf_ring_val integers related functions and use them in various fillers. Good thing is, having tests help us in spot any issue we might introduce. I can push the branch upstream so that we can work together?

Andreagit97 commented 1 year ago

I agree! I have a local branch with some changes, to split up bpf_ring_val integers related functions and use them in various fillers. Good thing is, having tests help us in spot any issue we might introduce. I can push the branch upstream so that we can work together?

Supercool, yep let's first see what other maintainers think about it :)

incertum commented 1 year ago

Re the graphs please also keep in mind that often the probe didn't compile, which sometimes is purely a container and GLIBC issue ... will try exporting 2 graphs in the future, one for "did it compile" vs "did it run".

But regardless I think @Andreagit97 is right with the hunch that we are slightly regressing especially since clang-15 ...

Andreagit97 commented 1 year ago

To track it I will assign /milestone next-driver, by the way, we still have to decide how to proceed so we are free to change the milestone in a second step

gnosek commented 1 year ago

Let me approach this from a slightly different angle: do we need all the complexity in bpf_val_to_ring? IIRC it's (mostly) a big switch over the different parameter types. What if (hits bong yet again) we replaced all the calls with more explicit calls to type-specific functions (i.e. bpf_u64_to_ring, bpf_user_str_to_ring etc.)?

We then need all the fillers to keep the type safety (schema param type == actual written param type) instead of putting it all in bpf_val_to_ring. I think it's actually a good thing, because right now there's no validation. Instead we coerce the written type to the requested type, whether it makes sense or not and the only check we do is the number of parameters stored vs schema. With type-specific helpers we'd have two approaches:

  1. (in debug builds?) when calling e.g. bpf_u64_to_ring, assert that the next param really is an u64
  2. (in release builds?) yolo and skip the validation entirely; we're not worse off in any case, now the (wrong) type cast will happen in userspace instead.

With (2) we still know the event is well-formed (even if e.g. what we expected to be an u64 is 1000 bytes long), but now we can detect (based on the param length) that the driver sent us the wrong type . This could give us a tiny perf boost and reduced in-kernel complexity (there's no explicit event schema in the kernel any more) with minimal downsides.

note 1: none of this invalidates the above discussion, keep on keeping on! note 2: this applies equally to kmod (and udig) and val_to_ring, not sure about modern_bpf

EDIT: I see https://github.com/falcosecurity/libs/pull/906 is a step in that direction, yay 🎉

FedeDP commented 1 year ago

Yep! This is precisely the road we are pursuing :)

Andreagit97 commented 1 year ago

yep, that's the reason why we introduced the driver framework test. The modern probe doesn't have any type of checking or assertions at runtime, we entirely rely on the test case for each syscall! And it seems pretty reasonable to detect this in tests instead of at runtime :joy: One thing I never understood about our drivers is why we make number/type assertions at runtime, in production losing precious clock cycles :laughing:

incertum commented 1 year ago

@Andreagit97 I am open-minded to explore what striking a balance between checks at runtime vs only for tests could look like ... in general still unclear what type of optimizations move the needle, so far most noticeable thing is dropping uninteresting syscalls (obviously), it's likely going to be challenging to really measure subtle differences.

FedeDP commented 1 year ago

1048 was a first good step; as highlighted by @incertum , we are now in a much better situation.

We are still investigating some issues on linux 4.16; therefore i'd leave this one open!

incertum commented 1 year ago

Yes let's see what else can bee 🐝 done 😉

FedeDP commented 1 year ago

/milestone next-driver

linajiang commented 10 months ago

how to diagnoise the problem, i alse got the verifier issues

FedeDP commented 10 months ago

@linajiang can you share the verifier log?

poiana commented 7 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 7 months ago

since it seems we reached a good stability I think we can close it for now!