Allow loading tracepoints other than the ones needed by Falco

Molter73 commented 9 months ago

Motivation

We used to depend on the generic way the libs used for attaching tracepoints in order to load our own directly to the syscalls we cared about. PR #1001 changed this behavior in favor of idempotency, (which I think is great!), but we would like to keep something similar to the previous behavior.

Feature

If tracepoints other than the ones needed by Falco are found in the eBPF probe, the libraries should attach them, since the most likely case is an adopter is trying to use a custom made driver.

Alternatives

We could add some additional means for control, similar to the existing SC system, but that might be overkill. ATM we also only build custom drivers for eBPF, we might do so for modern_bpf in the near future. We are not sure this is a worthy change for kernel modules, since from our testing the overhead of attaching to sys_enter/sys_exit is negligible compared to the bpf variants.

Additional context

An example implementation can be found in #1375, we will be using something similar in our fork of the libs.

Andreagit97 commented 9 months ago

Still need to find time to think about it, I will come back soon! Sorry

FedeDP commented 9 months ago

If tracepoints other than the ones needed by Falco are found in the eBPF probe, the libraries should attach them, since the most likely case is an adopter is trying to use a custom made driver.

Mmmh given the schema (ie: event_table) is still the same, i expect the loaded driver to push events that libscap knows about, right? That's why i'd love to allow people to attach directly to the syscalls specific tracepoint, bypassing sys_enter/exit hooks. This is possibly a bigger change, but would also carry some big benefits. As i already shared on the slack channel, i envisioned this feature like a single boolean option for the {bpf,modern,kmod} engines, then all the logic should be buried inside. This means that a request to enable PPM_SC_OPENAT would directly attach the sys_openat tracepoint, instead of flipping a bit in the interesting syscalls map. It also means that all the initialization work being done by sys_enter/exit should be somehow replicated somewhere else, likely we would want to attach an init_filler as starting hook for all tracepoints, and then jump to the syscall-specific one, but i am not sure. I planned to give this a shot ~1yr ago, but never found the spare time to look into this!

TLDR: my 2c -> let's try to integrate the feature in the system in a smooth way, instead of adding a "workaround".

Molter73 commented 9 months ago

If it helps, this is how we build our custom ebpf driver that attaches directly to syscalls, we wrote our own probe.c file and used driver/bpf/ as an include directory: https://github.com/stackrox/collector/tree/master/kernel-modules/probe

We basically made it so the same code that is run by sys_enter/sys_exit is added to every syscall. We also had to use regular tracepoints, but I can't recall why that was needed. @Stringy is the mastermind behind this so he can probably add some more context if needed (he is also gonna try to do a similar thing with the modern probe in the near future).

I agree that a proper solution is better than a workaround, but I'm pushing for stackrox to drop our fork and not being able to attach tracepoints directly is a hard stopper, we can try to carry the patch I shared for the short-term but I'd like us to drop it as soon as humanly feasible. Let me know if we can help with any sort of testing.

Andreagit97 commented 8 months ago

Hi Mauro, I finally found some time to think about the issue and the related PR, sorry for the delay. Looking at the code, the patch is fine, the only downside that I see is that we lose strict control over what is attached to the kernel. If in the future we add/rename some programs we need to be careful to not fall in the final loop that attaches everything, otherwise is possible that some programs will be injected automatically without noticing them.

Losing this strict check on what we inject is not a big issue but we are taking a step back with respect to what we chose some months ago. Moreover, we are adding dead code in the upstream repo, again the patch is small, but it is always dead code... my suggestion here is to try to keep this change in the fork until it exists, and then when it's time to remove the fork, if this is the only patch left pending we can think about integrating it upstream, WDYT?

erthalion commented 8 months ago

It's an important topic, so I allow me to add my 2c to the discussion.

I think everyone agrees that working on sys_enter/sys_exit level might be too broad for some use cases, and having more flexibility around this is positive for the project. Conceptually it implies there should be an interface that describes what to load into the kernel, which programs, to which tracepoints, etc. Elf sections in the bpf probe seems to be a good fit for that: it provides enough isolation (one can swap the probe, and everything still works); it's flexible enough; other projects are following this approach (e.g. libbpf with all those rules about modifying the default behaviour via the section name). From this point of view I believe this PR is a step in the right direction.

It certainly makes sense to be able to distinguish one type programs from the other (and in fact it's even sufficient), to avoid situations like you've described:

If in the future we add/rename some programs we need to be careful to not fall in the final loop that attaches everything, otherwise is possible that some programs will be injected automatically without noticing them.

There are already various program name prefixes in play, I think this could be addressed by distinguishing programs in question with something like custom/<...> or via introducing any other type of hierarchy that will make it clear what has to be attached automatically, and what's not (again, hello libbpf approach).

Moreover, we are adding dead code in the upstream repo, again the patch is small, but it is always dead code

This strikes me as an interesting argument, so I want to spend few words here. What we work on here is a library, and if someone is using the code, it's certainly not dead, even if it's not exercised in the main project CI :)

Andreagit97 commented 8 months ago

ei @erthalion these are all good points! I agree that adding custom prefixes like custom/<...> could avoid some troubles in the future... what I was trying to say with my previous comment is that I've always seen libscap + drivers as a unique block, so for sure, we offer tracing capabilities to a userspace program that wants to use the libraries but I'm not sure we ever intended libscap as a custom loader for tracing program...IMO there is a big difference between libscap and libbpf.

libscap offers some APIs to trace your system with predefined hooks, these hooks are controlled by ppm_sc codes.
libbpf is a loading library so of course you are free to attach whatever program in the kernel.

According to the current libscap API, a BPF program should always be paired with a PPM_ code, or at least related in some way. This relation allows users to control what to attach/detach at loading time. Now, adding the capability to attach custom programs that match our event SCHEMA is for sure a nice feature but it's not clear to me how to control this custom loading logic userspace side... Today if you want to attach/detach a specific program you should provide the PPM_ code to libscap and the program will be correctly attached/detached. It is not clear to me how to do it with the proposed changes, everything that is found in the elf is injected into the kernel without control. Now probably I've used the wrong expression we are adding dead code in the upstream repo, sorry for that, this is not dead code but this is something that doesn't follow this PPM_SC <---> BPF program relation, and I'm not sure how users can take advantage of it if we don't provide an API to regulate it.

Moreover, this absence of control scares me a little bit because it could cause scenarios where users attach custom programs and face issues in libsinsp/libscap. In some way with this patch, we are saying "We allow you to attach everything that is in the elf file" but indeed we can just support programs that maintain our event schema and that don't overlap the "standard" collection we provide in the upstream driver... So I agree this could be a nice feature but I would like to have more control over what is attached and how it interacts with the userspace side

erthalion commented 8 months ago

Thanks for clarifications, @Andreagit97 !

this is something that doesn't follow this PPM_SC <---> BPF program relation, and I'm not sure how users can take advantage of it if we don't provide an API to regulate it.

I see your point. This is probably one part of misunderstanding, the main goal here is to provide more flexibility for those BPF progs that actually follow the PPM_SC to BPF relation and speak Falco "protocol" fluently. The use case we intend to support is about narrowing down the scope.

Moreover, this absence of control scares me a little bit because it could cause scenarios where users attach custom programs and face issues in libsinsp/libscap. In some way with this patch, we are saying "We allow you to attach everything that is in the elf file" but indeed we can just support programs that maintain our event schema and that don't overlap the "standard" collection we provide in the upstream driver... So I agree this could be a nice feature but I would like to have more control over what is attached and how it interacts with the userspace side

For me to understand, what kind of control you have in mind? Can you describe what are the expectations of it being applied to user space <-> kernel space communication? And what are the consequences of not having such control?

From what I see the worst what could happen is that there is a BPF program, that can't interact with Falco, but still got loaded, causing issues -- this is probably on the BPF program developers, the community doesn't have to support such cases.

As a side note, it's probably also a question of project developers position. Surely, there are questions about how to make XYZ the best way possible, and it's not always clear right away how to achieve that. But prohibitive from the start approach will rarely provide a good ground to develop this XYZ further :)

Andreagit97 commented 8 months ago

For me to understand, what kind of control you have in mind? Can you describe what are the expectations of it being applied to user space <-> kernel space communication? And what are the consequences of not having such control?

Let's consider an example. If I remember well your use case, you allow attaching single syscalls instead of the generic sys_enter/sys_exit tracepoints. Let's suppose I'm a user and I want to use this cool feature, I would like to attach only the open syscall without using sys_enter/sys_exit tracepoints. I expect a sort of userspace interface that allows me to attach only the necessary tracepoints (sys_enter_open, sys_exit_open), something like a boolean single_syscalls=true, but today we have nothing to do that so what happens is that sys_enter and sys_exit will be attached under the hood. Probably as a libsinsp user, I could detach sys_enter and sys_exit manually, but then, what do I have to do? How many trace points are attached now in the kernel? which are the tracepoints attached? do I have only sys_enter_open and sys_exit_open or maybe I have also other tracepoints like sys_enter_setuid, and sys_exit_setuid because libscap found them in the elf file? How can I disable them at run-time?

Here are some points I would like to discuss

How a user can take advantage of this new feature in a clear way? I know this could be a step 0 for something but I would like to have a clear plan before starting with possible steps in this direction. Moreover, with the proposed solution, we inject all programs we find in the elf but what about if we want only some of them, like in the proposed example? This is the "control" I was talking about in the previous message.
Merging Mauro's PR would mean that we want to "officially" support custom drivers and not only the ones that live in this repo. Today the patch is elegant and small, but what about the future? we need to take it into consideration in all the following designs and this could have a not negligible maintenance cost. Maybe is something that we want but we need to be sure about the choice otherwise there is the risk that we break the support at every refactor. Moreover, the modern probe today is directly embedded into libscap, so the possibility of integrating an external skeleton inside the libscap becomes even more complex.
As we previously said, the possible custom drivers must be compatible with the event SCHEMA and the logic of the libscap engine. Since these are quite strict requirements I don't expect many adopters to patch the drivers with different programs injected, right now the only use case we know is your use case. So maybe we could also evaluate integrating your use case in the upstream drivers... integrating it, will force the creation of an explicit logic more resilient to future changes/refactors, it will probably require an initial big effort, but maybe in the future, it could help us in maintaining the solution instead of having flying little patches.

These are just my thoughts, if other maintainers agree with the proposed changes I'm fine, I would only like to understand what is the plan here.

FedeDP commented 8 months ago

Jumping in once again just to :+1: everything Andrea said; most of the points were already mentioned by me in my first comment. IMHO for me the solution is:

So maybe we could also evaluate integrating your use case in the upstream drivers... integrating it, will force the creation of an explicit logic more resilient to future changes/refactors, it will probably require an initial big effort, but maybe in the future, it could help us in maintaining the solution instead of having flying little patches.

Keep the proposed patch fork-only until we have a proper feature that would allow you to throw the patch away. I very much prefer a good solution that might be useful for everyone (as you know, if user is only interested in a bunch of syscalls, attaching them directly has great perf benefits) than a workaround for your use case (it's not like because this is your use case, is more like "this is a workaround for a use case we don't support, so why a workaround in the first place?").

incertum commented 7 months ago

Quoting @erthalion

I think everyone agrees that working on sys_enter/sys_exit level might be too broad for some use cases, and having more flexibility around this is positive for the project.

As a side note, it's probably also a question of project developers position. Surely, there are questions about how to make XYZ the best way possible, and it's not always clear right away how to achieve that. But prohibitive from the start approach will rarely provide a good ground to develop this XYZ further :)

I have put this issue on the Dec 2023 core maintainers strategy discussion.

Reviewing the project's history, it appears that iterations have always been necessary. Therefore, we should not abruptly impose a different standard for @erthalion and @Molter73 but rather work towards a solution that addresses the concerns of all parties involved while also promoting innovation and iterative progress. As long as we are not breaking Falco's default behavior, I see no problems with iterating, especially since @erthalion and @Molter73 are committed to steering us toward an optimal solution. By the way, I am also very interested in such enhanced customizations.

FedeDP commented 7 months ago

I think this

but rather work towards a solution that addresses the concerns of all parties involved while also promoting innovation and iterative progress

is what i asked for; i mean, if we could find a good solution that feels less like a "workaround" and more like an enablement for a new feature, i am all for it!

incertum commented 7 months ago

Great @FedeDP -- @erthalion and @Molter73 what are concrete next steps based on the WIP PR that is already open?

erthalion commented 7 months ago

Great @FedeDP -- @erthalion and @Molter73 what are concrete next steps based on the WIP PR that is already open?

Thanks for chiming in, @incertum!

The way I see it, we still need to agree on one important point before moving forward: I think even a "less workaround" solution will imply having more responsibility on the bpf probe (e.g. about negotiation which progs to attach where), because otherwise chances are low that the implementation will be flexible enough. My takeaway from the discussion is that folks are strongly against that, and to not waste anyone's time we need to clarify if such approach (well thought and thoroughly developed) is not going to be frowned upon.

Otherwise, I think @Stringy has some ideas about how to proceed and shape up this feature with your help.

FYI, Mauro is not going to be available for the next few months, so I'm afraid we have to continue without his input.

Stringy commented 7 months ago

Hi folks, I've been watching this thread with bated breath for some time now, and I think it's time to weigh in a little. I've developed this feature for eBPF and that has served us well for quite some time now, and I'm in the process of working on a solution for modern_bpf in our fork (admittedly both built in a way that is very specific to our use-case.)

For me the secret sauce is the event schema. Each of my implementations has been focused primarily on getting to a point where I can tail call into the relevant programs so the data is processed and written to the buffers in the right way. In the case of eBPF this was purely a driver change (because the lifecycle loading/unloading of the driver is already very flexible) and in modern_bpf this has required driver changes and libpman changes to support a modified skeleton (as a brief aside it is curious to me that modern_bpf already supports alternative skeletons via USE_BUNDLED_MODERN_BPF and MODERN_BPF_SKEL_DIR though as it stands currently it will only compile if that skeleton matches the existing one)

I agree with @erthalion that a proper solution will most likely involve shifting some ownership of a custom driver's internals back to the "user" (provided the schema is followed.) Then the API becomes an extension to the lifecycle management of the driver (in effect an extensible libpman) in which the user manages what is loaded and attached (via callback, or vtable or whatever). Provided the schema is followed in the driver, libscap should be happy. This might be a good first step, to dip our toes into the feature, but comes with the agreed caveat that thou shalt bring thine own driver.

Making the drivers flexible to this customisation feels a fair bit more difficult, not least because of the common initialization, the very specific function signatures (as defined by the tracepoints themselves), and the risk of verifier complaint. For my customisation of the modern_bpf driver (which is not yet finished) I have experimented with changing the relevant programs to BPF_KSYSCALL and modifying the section names accordingly, and that seems to work, but these changes are very difficult to manage in a truly generic solution. If customisation at this level is possible, then its use in conjunction with the lifecycle management changes would feel like a fully supported solution, at least in my estimation.

incertum commented 7 months ago

Thanks @erthalion and @Stringy! Thinking more about it: Could we open a proposal re the bigger vision of driver extensions and customizations? Let me post some more ideas re how such a proposal could be structured after the Thanksgiving break. We could treat this and the userspace anomaly detection work I am soon more seriously starting as similar workflows in the sense that lots of iterations will be required and we don't know all the answers beforehand.

In addition, perhaps we could brainstorm during one of the upcoming community calls? We can discuss what date would work best for everyone who wishes to participate in the discussion.

incertum commented 7 months ago

Reread all comments and in addition to opening a new "vote" issue I also posted some concrete suggestions in the draft PR https://github.com/falcosecurity/libs/pull/1375#issuecomment-1833207329

Addressing a follow-up concern raised by Andrea in https://github.com/falcosecurity/libs/issues/1376#issuecomment-1774888712, which aligns with Federico's concern in https://github.com/falcosecurity/libs/issues/1376#issuecomment-1750107423:

An "initial workaround" solution is actual preferable I think until we have a clearer understanding of whether these capabilities are solely intended for (1) libs only adopters or (2) Falco adopters as well. We could postpone the decision on Falco UX and broader libs control internals (such as expanding the SC system or exploring alternative approaches) until later. That way we will effectively minimize the amount of code that needs to be refactored in the future and we can achieve faster and easier iterations (as also suggested by Mauro in his initial comment here https://github.com/falcosecurity/libs/issues/1376#issue-1923825950).

While we're currently focusing on syscall monitoring, we shouldn't put off dealing with the LSM hooks that will still be important in the future (I posted similar thoughts many many times before).

Therefore, let's take the time to work together on a broader vision proposal without blocking this work in the meantime? See my last comment https://github.com/falcosecurity/libs/issues/1376#issuecomment-1823640003.

Who would want to take the lead for such a proposal?

mstemm commented 7 months ago

I'm catching up, but would the custom bpf programs still create events that are identical to the events created by the bpf program that attaches to sys_enter/sys_exit? Or do they create completely disjoint data and pass that data up to userspace.

incertum commented 7 months ago

I'm catching up, but would the custom bpf programs still create events that are identical to the events created by the bpf program that attaches to sys_enter/sys_exit? Or do they create completely disjoint data and pass that data up to userspace.

Identical schema for the initial use case of one of our libs adopters (what we discuss here), see https://github.com/falcosecurity/libs/issues/1376#issuecomment-1822777817 @Stringy comment:

Provided the schema is followed in the driver, libscap should be happy.

For a larger refactor and flexible solution that will be future proof for all adopters and use cases and scenarios, including Falco, lets go the proposal route and also include LSM hook support in there.

mstemm commented 7 months ago

I'm catching up, but would the custom bpf programs still create events that are identical to the events created by the bpf program that attaches to sys_enter/sys_exit? Or do they create completely disjoint data and pass that data up to userspace.

Identical schema for the initial use case of one of our libs adopters (what we discuss here), see #1376 (comment) @Stringy comment:

Ok, I wonder how that would interact with libsinsp, then. libsinsp and its parsers expect a pretty wide set of syscalls to reconstruct its internal state accurately and make that state available via filterchecks and filters. If an alternate bpf program only handled some syscalls (for example opens + execs but not, say, connects and listens), wouldn't using libsinsp be problematic?

(Again, I'm catching up and don't totally understand the use case yet).

incertum commented 7 months ago

I'm catching up, but would the custom bpf programs still create events that are identical to the events created by the bpf program that attaches to sys_enter/sys_exit? Or do they create completely disjoint data and pass that data up to userspace.

Identical schema for the initial use case of one of our libs adopters (what we discuss here), see #1376 (comment) @Stringy comment:

Ok, I wonder how that would interact with libsinsp, then. libsinsp and its parsers expect a pretty wide set of syscalls to reconstruct its internal state accurately and make that state available via filterchecks and filters. If an alternate bpf program only handled some syscalls (for example opens + execs but not, say, connects and listens), wouldn't using libsinsp be problematic?

(Again, I'm catching up and don't totally understand the use case yet).

No worries at all, this should not be of concern here as libs adopters who mess with these parts of the code base should know what they are doing 😉 If they want to break things they can. Also note that with the new base_syscalls Falco option you can also break Falco if you want, your decision.

leogr commented 6 months ago

Also note that with the new base_syscalls Falco option you can also break Falco if you want, your decision.

A bit OT, but still relevant, IMO: This is not necessarily good for users, especially if an option allows them to break things in unexpected ways that will make users waste an infinite amount of time troubleshooting. Don't get me wrong, I was in favor of base_syscalls (and I'm still in), but generally speaking, we should only allow users to break things in a controlled way (if they like to do so) rather than freely allow them to do everything they want.

poiana commented 3 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Molter73 commented 3 months ago

/remove-lifecycle stale

poiana commented 3 weeks ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Molter73 commented 1 day ago

As mentioned in https://github.com/falcosecurity/libs/pull/1375#issuecomment-2210501286, we no longer need this feature, so I'm closing the issue.

falcosecurity / libs

Allow loading tracepoints other than the ones needed by Falco #1376