falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
224 stars 162 forks source link

[FEATURE] Options available for tapping into "linux_binprm" that holds args used when loading binaries #621

Open incertum opened 2 years ago

incertum commented 2 years ago

Motivation

Quote from https://github.com/falcosecurity/libs/pull/595

Another kernel side signal that would like to look into and possibly add to this PR would be:

"Interpreter scripts" aka text files with execute permissions (see https://man7.org/linux/man-pages/man2/execve.2.html) For example chmod +x a.sh && ./a.sh or chmod +x a.sh && exec ./a.sh is currently logged as "proc.exepath":"/tmp/a.sh","proc.name":"a.sh","proc.cmdline":"a.sh ./a.sh", but the interpreter was configured as #! /bin/sh and we wouldn't know what interpreter binary ran the script directly or that it was not a binary without inferring from extension if even available and we know how fragile that is.

Please note, not talking about the use case where you run the interpreter and pass the script, like /bin/sh a.sh would give "proc.exepath":"/bin/sh","proc.name":"sh","proc.cmdline":"sh a.sh".

Any thoughts on above? @LucaGuerra @loresuso @FedeDP @Andreagit97

struct linux_binprm is readily available in the sched/sched_process_exec tracepoint, see https://github.com/falcosecurity/libs/blob/master/driver/bpf/types.h#L142 that got introduced by @Andreagit97 for ARM64 https://github.com/falcosecurity/libs/pull/416. struct linux_binprm holds args used when loading binaries https://github.com/torvalds/linux/blob/master/include/linux/binfmts.h#L49-L60.

Would it even be possible to access struct linux_binprm through the raw tracepoint? If so how? I see that mm_struct has struct linux_binfmt, but that's it. Hopefully I am just missing something and there is an easy solution.

If it is not possible to access it over thesys_exit raw tracepoint, could we have an open discussion around unifying PPME_SYSCALL_EXECVE_19_X and PPME_SYSCALL_EXECVEAT_X to using the sched/sched_process_exec tracepoint instead? Rating this in terms of security monitoring enhancement I would give it a 10 out of 10. While it would be a slight perf hit, there are noisier system calls comparatively and we kind of already have to do it that way for ARM64 anyways.

What other options would be available? Are there more alternatives?

incertum commented 2 years ago

Related to thinking in https://github.com/falcosecurity/libs/issues/252 @LucaGuerra @loresuso.

loresuso commented 2 years ago

Hi @incertum, I think you raised a significant point here. I do not see any obvious way to retrieve struct linux_binprm from the sys_exit tracepoint, but I agree that the information contained in that struct could be relevant for security monitoring.

In the end, that struct is basically passed to all the LSM to perform their checks upon execution, so maybe this deserves further investigation. We could be able to easily retrieve also the full path of the executable without performing any path resolution (the kernel already did it for us), and just only for this point, it could be really valuable, other than also pointing out the interpreter in case we are executing a script. Let's see the opinion of the other folks too 🙂

Thank you for noticing this!

Andreagit97 commented 2 years ago

My 2 cents on this. Even if we are able to recover the struct linux_binprm from sys_exit it would be really a mess and really expensive while with sched/sched_process_exec we can obtain it from the registers, so yes I would use the sched/sched_process_exec here...

I think that we have 2 possible directions to follow:

To be honest, here I would vote for the second choice because this would open a new world for Falco! We could trace almost whatever we want not only syscalls, the pain point is the design phase as always but I think that we can do that in some ways.

WDYT about that @FedeDP @gnosek @leogr?

Andreagit97 commented 2 years ago

Just thinking again about it... since we have the collision with this tracepoint already used in ARM, what about using a kprobe? Ok, kernel functions could change over time but we already have all the history from 4.14 to 6.0 so why not :thinking:?

Or maybe since in this case, we have a simple tracepoint to do that why don't we use a second BPF program attached to the same tracepoint :thinking:

Just put to the table some ideas here :)

Andreagit97 commented 2 years ago

To be honest, here I would vote for the second choice because this would open a new world for Falco! We could trace almost whatever we want not only syscalls, the pain point is the design phase as always but I think that we can do that in some ways.

This issue is also in some way related to this one https://github.com/falcosecurity/libs/issues/252, the second approach could allow us to support also kprobes in some security hooks

leogr commented 2 years ago

I can't decide easily. :thinking: I really believe we have to experiment a bit

incertum commented 2 years ago

Would favor staying open minded and explore all options. Furthermore, shall we follow a data-driven approach? Meaning we measure perf overhead on actual production servers instead of making decisions based on reputation?

Furthermore, it seems like kprobes are needed to bridge various security monitoring gaps. On the other hand for the particular data field discussed here (the full path of the interpreter) we have that shortcut available as you confirmed @Andreagit97 and @loresuso also pointed out that we can fetch the executable filename right there and save a few lookup cycles. Would be curious if there is an actual noticeable CPU hit given execve* really doesn't happen that often when compared to what happens while a process is running ...

How could we best start experimenting?

@leogr in general it seems that now that we have done this great refactor of syscalls of interest and tracepoints of interest we could more easily expand on this configurability to basically support all options, but also give the option to tailor the cost of running the tool to the budget available.

Andreagit97 commented 1 year ago

Would favor staying open minded and explore all options. Furthermore, shall we follow a data-driven approach? Meaning we measure perf overhead on actual production servers instead of making decisions based on reputation?

Super +1 on my side, testing it directly in real scenarios would be amazing!

How could we best start experimenting?

What about a kprobe here https://github.com/torvalds/linux/blob/a63f2e7cb1107ab124f80407e5eb8579c04eb7a9/fs/exec.c#L1715? Here you can find more info about this hook point https://github.com/torvalds/linux/blob/a63f2e7cb1107ab124f80407e5eb8579c04eb7a9/include/linux/lsm_hooks.h#L62. This should allow us to take all the information we want and could easily become a new security event generated by a kprobe :thinking:

The only thing that worries me is this statement, what about perf ?

This hook may be called multiple times during a single execve.

incertum commented 1 year ago

This is gonna be next early next year (LSM hooks experiments in modern_bpf) ...

poiana commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

incertum commented 1 year ago

/remove-lifecycle stale

poiana commented 9 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 9 months ago

/remove-lifecycle stale

poiana commented 6 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

incertum commented 6 months ago

/remove-lifecycle stale

poiana commented 3 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 2 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

leogr commented 2 months ago

/remove-lifecycle stale /remove-lifecycle rotten