Open incertum opened 11 months ago
@stevenbrz let's see if the other maintainers are on board. If yes, it could be a great "warm up" contribution for you to take on :wink:
Yes, Falco doesn't scale on these huge servers and we need to find a possible solution to mitigate this case, one idea could be:
exit_events
, enter_events
are just needed to mitigate TOCTOU or in old kernel versions.comm
, on the exepath
, on the cmdline
,...) These filters are evaluated in userspace when we read the event from the next (if we have a match we add the pid
of this process inside the hash table used by the drivers so the following events will be excluded kernel side). Of course, we need to evaluate how many filters we can process because it could be quite heavy. Moreover, I would avoid filtering clone
/execve
/proc_exit
events, we have already seen these don't cause perf overhead and we need them to keep a reliable process tree inside sinsp.This is just an idea but maybe it could work
Moreover, I would avoid filtering clone/execve/proc_exit events, we have already seen these don't cause perf overhead and we need them to keep a reliable process tree inside sinsp.
Big +1 those aren't an issue.
I'm in support of this.
I'm in favor of investigating this front :+1:
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Motivation
The hardware landscape is evolving towards models with 96, 128, or more CPUs. However, Falco currently faces usability challenges on such machines, particularly those dealing with heavy traffic, especially in network and file-related activities.
One potential solution could involve allowing end users to specify a subset of enter or exit syscall events they want to drop on the kernel side. This feature would be flagged as very risky to use, similar to the existing
base_syscalls
feature.For instance, users might opt to drop enter syscall events for
open*
andconnect
syscalls, even though they are aware that doing so could expose them to TOCTOU attacks (mitigated by default via this PR). Nevertheless, this trade-off might be preferable to completely disabling Falco.Feature
Introduce a new config
base_syscalls.exclude_enter_exit_set
, allowing exclusion of specific enter or exit events that are part of thecustom_set
syscalls. This exclusion is limited to scenarios where it makes sense for enter or exit events. Ensure good documentation.Additional context
https://github.com/falcosecurity/libs/issues/1557
CC @falcosecurity/libs-maintainers