Open incertum opened 2 years ago
Related to thinking in https://github.com/falcosecurity/libs/issues/252 @LucaGuerra @loresuso.
Hi @incertum, I think you raised a significant point here. I do not see any obvious way to retrieve struct linux_binprm
from the sys_exit
tracepoint, but I agree that the information contained in that struct could be relevant for security monitoring.
In the end, that struct is basically passed to all the LSM to perform their checks upon execution, so maybe this deserves further investigation. We could be able to easily retrieve also the full path of the executable without performing any path resolution (the kernel already did it for us), and just only for this point, it could be really valuable, other than also pointing out the interpreter in case we are executing a script. Let's see the opinion of the other folks too 🙂
Thank you for noticing this!
My 2 cents on this. Even if we are able to recover the struct linux_binprm
from sys_exit
it would be really a mess and really expensive while with sched/sched_process_exec
we can obtain it from the registers, so yes I would use the sched/sched_process_exec
here...
I think that we have 2 possible directions to follow:
execve/execveat
syscall flow and use sched/sched_process_exec
to send execve/execveat
exit events on all architectures (quick and dirty)sched/sched_process_exec
or some Kprobes to security hooks for example (clearer but more complex)To be honest, here I would vote for the second choice because this would open a new world for Falco! We could trace almost whatever we want not only syscalls, the pain point is the design phase as always but I think that we can do that in some ways.
WDYT about that @FedeDP @gnosek @leogr?
Just thinking again about it... since we have the collision with this tracepoint already used in ARM, what about using a kprobe
? Ok, kernel functions could change over time but we already have all the history from 4.14
to 6.0
so why not :thinking:?
Or maybe since in this case, we have a simple tracepoint to do that why don't we use a second BPF program attached to the same tracepoint :thinking:
Just put to the table some ideas here :)
To be honest, here I would vote for the second choice because this would open a new world for Falco! We could trace almost whatever we want not only syscalls, the pain point is the design phase as always but I think that we can do that in some ways.
This issue is also in some way related to this one https://github.com/falcosecurity/libs/issues/252, the second approach could allow us to support also kprobes in some security hooks
I can't decide easily. :thinking: I really believe we have to experiment a bit
Would favor staying open minded and explore all options. Furthermore, shall we follow a data-driven approach? Meaning we measure perf overhead on actual production servers instead of making decisions based on reputation?
Furthermore, it seems like kprobes
are needed to bridge various security monitoring gaps. On the other hand for the particular data field discussed here (the full path of the interpreter) we have that shortcut available as you confirmed @Andreagit97 and @loresuso also pointed out that we can fetch the executable filename right there and save a few lookup cycles. Would be curious if there is an actual noticeable CPU hit given execve* really doesn't happen that often when compared to what happens while a process is running ...
How could we best start experimenting?
@leogr in general it seems that now that we have done this great refactor of syscalls of interest
and tracepoints of interest
we could more easily expand on this configurability to basically support all options, but also give the option to tailor the cost of running the tool to the budget available.
Would favor staying open minded and explore all options. Furthermore, shall we follow a data-driven approach? Meaning we measure perf overhead on actual production servers instead of making decisions based on reputation?
Super +1 on my side, testing it directly in real scenarios would be amazing!
How could we best start experimenting?
What about a kprobe
here https://github.com/torvalds/linux/blob/a63f2e7cb1107ab124f80407e5eb8579c04eb7a9/fs/exec.c#L1715?
Here you can find more info about this hook point https://github.com/torvalds/linux/blob/a63f2e7cb1107ab124f80407e5eb8579c04eb7a9/include/linux/lsm_hooks.h#L62. This should allow us to take all the information we want and could easily become a new security event generated by a kprobe :thinking:
The only thing that worries me is this statement, what about perf ?
This hook may be called multiple times during a single execve.
This is gonna be next early next year (LSM hooks experiments in modern_bpf) ...
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/falcosecurity/community.
/lifecycle rotten
/remove-lifecycle stale /remove-lifecycle rotten
Motivation
Quote from https://github.com/falcosecurity/libs/pull/595
struct linux_binprm
is readily available in thesched/sched_process_exec
tracepoint, see https://github.com/falcosecurity/libs/blob/master/driver/bpf/types.h#L142 that got introduced by @Andreagit97 for ARM64 https://github.com/falcosecurity/libs/pull/416.struct linux_binprm
holds args used when loading binaries https://github.com/torvalds/linux/blob/master/include/linux/binfmts.h#L49-L60.Would it even be possible to access
struct linux_binprm
through the raw tracepoint? If so how? I see thatmm_struct
hasstruct linux_binfmt
, but that's it. Hopefully I am just missing something and there is an easy solution.If it is not possible to access it over the
sys_exit
raw tracepoint, could we have an open discussion around unifyingPPME_SYSCALL_EXECVE_19_X
andPPME_SYSCALL_EXECVEAT_X
to using thesched/sched_process_exec
tracepoint instead? Rating this in terms of security monitoring enhancement I would give it a 10 out of 10. While it would be a slight perf hit, there are noisier system calls comparatively and we kind of already have to do it that way for ARM64 anyways.What other options would be available? Are there more alternatives?