falcosecurity / falco

Cloud Native Runtime Security
https://falco.org
Apache License 2.0
7.42k stars 903 forks source link

[BUG] Solving overlapping syscall names issues #2443

Closed jasondellaluce closed 1 year ago

jasondellaluce commented 1 year ago

This acts both as a bug report and as a solution proposal. Solving this problem inevitably involves some breaking changes in the Falco UX, so the goal here is to find an agreement on the direction we want to go for.

Problem

In Falco rules, events can be matches by type through the evt.type filter field, both in rules' conditions and outputs (e.g. evt.type=open, evt.type in (openat, close), etc). The string values extracted through event type with evt.type come right from libsinsp, libscap and the drivers, and as for today have two distinct origins:

In the libraries, we currently have an design flaw where some syscalls (each with their own distinct platform-agnostic PPM_SC representation) are mapped to generate the same event at the kernel-level in the drivers. The end result is that a given evt.type comparison would unexpectedly match more than one syscall. For example, accept4 and accept are mapped to both produce the same event, and so the filter evt.type=accept matches both accept4 and accept syscalls at runtime. This can be ambiguous and misleading.

This issue is tracked in detail in https://github.com/falcosecurity/libs/issues/911.

Secondary problems

Since starting the development for the syscall selection feature (https://github.com/falcosecurity/falco/issues/2371, https://github.com/falcosecurity/falco/issues/2373), Falco started relying on the PPM_SC constants to configure the kernel drivers about which syscalls need to be collected at runtime. Considering the problem above, we have some secondary issues:

Proposed solution

The discussions involving me, @Andreagit97, @FedeDP, and @incertum converged in two proposals:

Breaking changes

Solving this issue will involve a breaking change in the Falco rules. Considering the example of a rule with a condition containing evt.type = accept, we would now need to change it into evt.type in (accept, accept4), because after the fix the accept4 syscall would have its own event. Not doing so would cause accept4 syscall events to be ignored by the rules. This applies for all the events listed in https://github.com/falcosecurity/libs/issues/911, however from my personal research accept seems to be the only used one in rulesets (including our default one).

In all cases, the migration plan consists only of changes such as evt.type = accept -> evt.type in (accept, accept4).

Action items

cc @falcosecurity/falco-maintainers @falcosecurity/libs-maintainers

I'm opening this issue for everyone's visibility and because we need to agree on:

Looking forward to having feedback! 🤗 If there will be no contrary opinions on those points, we'll proceed with the proposed solution.

incertum commented 1 year ago

Intention is to re-iterate aspects Jason already mentioned as well as provide a different view / summary of https://github.com/falcosecurity/libs/issues/911 in order to derive a more concrete / unambiguous plan.

2023-03-15: Edited the setuid, setuid32 Row based on community call conclusion.

syscall string names new rule syntax requirement (Falco >= 0.35.x) current event codes current syscall codes notes
accept, accept4 evt.type in (accept, accept4) PPME_SOCKET_ACCEPT_5_E, PPME_SOCKET_ACCEPT_5_X, PPME_SOCKET_ACCEPT4_5_E, PPME_SOCKET_ACCEPT4_5_X PPM_SC_ACCEPT, PPM_SC_ACCEPT4 Overlapping event name "accept" in Falco <= 0.34.x, but already unique events, can be easily corrected
inotify_init, inotify_init1 evt.type in (inotify_init, inotify_init1) PPME_SYSCALL_INOTIFY_INIT_E, PPME_SYSCALL_INOTIFY_INIT_X PPM_SC_INOTIFY_INIT, PPM_SC_INOTIFY_INIT1 Create a new PPME event for inotify_init1? Or re-use same event type but require rules to call exact syscall string name? Same for pipe, pipe2 or eventfd, eventfd2 or signalfd, signalfd4 pairs
umount, umount evt.type in (umount, umount2) PPME_SYSCALL_UMOUNT_E, PPME_SYSCALL_UMOUNT_X, PPME_SYSCALL_UMOUNT_1_E, PPME_SYSCALL_UMOUNT_1_X PPM_SC_UMOUNT, PPM_SC_UMOUNT2 Plan is to rename the event names, else no issues with new requirements similar to accept, accept4 case
init_module evt.type = init_module PPME_GENERIC_E, PPME_GENERIC_X PPM_SC_INIT_MODULE Falco <= 0.34.x could not tap into system calls that are internally tracked as generic events. Starting with Falco >= 0.35.x all system calls Falco supports can be used in rules. Is the plan to keep PPME_GENERIC_ for these cases or exploding out a unique event type for each generic system call?
setuid, setuid32 evt.type = setuid PPME_SYSCALL_SETUID_E, PPME_SYSCALL_SETUID_X PPM_SC_SETUID, PPM_SC_SETUID32 Same applies to all other cases such as fcntl64, fcntl or ugetrlimit, getrlimit etc. Difference here is just the architecture, we decided during the 2023-03-15 community call to keep this overlap for now.

Re syscall extraction from each rule (this means evt.type corresponds to a system call event) ... do we continue to use names_to_sc_set / sc_names_to_sc_set ppm sc API or do I understand correctly we want to cut over to event_names_to_sc_set?

jasondellaluce commented 1 year ago

Thanks @incertum!

Is the plan to keep PPMEGENERIC for these cases or exploding out a unique event type for each generic system call?

I think removing PPMEGENERIC would be too much of a structural change for this effort, plus in my opinion having it it's the right approach to support all the syscalls that don't have a specific filler in our drivers.

do we continue to use names_to_sc_set / sc_names_to_sc_set ppm sc API or do I understand correctly we want to cut over to event_names_to_sc_set?

My proposal is for to rename the current names_to_sc_set into event_names_to_sc_set, because that's its actual semantics as for now, and keep using it in Falco (so no UX change from the Falco perspective). Then, I'm also proposing adding sc_names_to_sc_set into the libsinsp API for parity, but not using it in Falco.

incertum commented 1 year ago

Is the plan to keep PPMEGENERIC for these cases or exploding out a unique event type for each generic system call?

I think removing PPMEGENERIC would be too much of a structural change for this effort, plus in my opinion having it it's the right approach to support all the syscalls that don't have a specific filler in our drivers.

Glad to hear we keep PPMEGENERIC thank you!

do we continue to use names_to_sc_set / sc_names_to_sc_set ppm sc API or do I understand correctly we want to cut over to event_names_to_sc_set?

My proposal is for to rename the current names_to_sc_set into event_names_to_sc_set, because that's its actual semantics as for now, and keep using it in Falco (so no UX change from the Falco perspective). Then, I'm also proposing adding sc_names_to_sc_set into the libsinsp API for parity, but not using it in Falco.

Works! Re implementation details: The new event_names_to_sc_set would still directly call sc_names_to_sc_set in order to resolve the rules evt.type that correspond to a system call? We would have to right? Because even if we fix all other syscall related event names (meaning they are unique and correspond to the actual system call string name as well) we continue to have the generic event type and the associated information loss.

jasondellaluce commented 1 year ago

That's right, we'll still rely on the syscall names when dealing with generic events.

Andreagit97 commented 1 year ago

I'm late to the party...

I think removing PPMEGENERIC would be too much of a structural change for this effort, plus in my opinion having it it's the right approach to support all the syscalls that don't have a specific filler in our drivers.

Agree with @jasondellaluce here, removing PPME_GENERIC_ would mean implementing all the missing syscalls, so quite a huge effort :_(

Works! Re implementation details: The new event_names_to_sc_set would still directly call sc_names_to_sc_set in order to resolve the rules evt.type that correspond to a system call? We would have to right? Because even if we fix all other syscall related event names (meaning they are unique and correspond to the actual system call string name as well) we continue to have the generic event type and the associated information loss.

Yep the end logic should be exactly the same, I would change just the name since it is quite misleading.

Then, I'm also proposing adding sc_names_to_sc_set into the libsinsp API for parity, but not using it in Falco.

Yeah I like the idea of parity if we need to add just one specular API, it's ok, but I would avoid adding tons of new APIs just to have specular methods, I would implement new methods only when we have a real need

incertum commented 1 year ago

Community call summary 2023-03-15:

No objection, we proceed with this proposal. In addition, we clarified that these changes accomplish a smoother and clearer end user experience.

Tracking libs event types <-> syscall refactors:

!!! We decided to keep the overlap for cases like setuid, setuid32 (for now) as it is merely an architecture distinction, see also @Andreagit97 comment in https://github.com/falcosecurity/libs/issues/911#issue-1591510121

Additional Action Items:

leogr commented 1 year ago

/milestone 0.35.0

incertum commented 1 year ago

All items completed, @jasondellaluce feel free to close this issue. Blogpost is scheduled as well -> tracked in community repo going forward.

jasondellaluce commented 1 year ago

/close

🥳

poiana commented 1 year ago

@jasondellaluce: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/2443#issuecomment-1496661355): >/close > >🥳 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.