[FEATURE] Resolving executable path symlink on execve

loresuso commented 1 year ago

By using symlinks attackers can potentially bypass Falco rules. This is because in our drivers, we take data from syscall arguments and by doing so, we implicitly trust something that is coming from userspace, thus controllable by a potential attacker. A concrete and remarkable example of this is the way attackers can set an arbitrary proc.name just by issuing an execve syscall with a symlink parameter named as they will. Of course, some of the rules rely on this field and I believe it should be made as more robust as possible.

To address this problem, we talked about using LSM Hooks multiple times. I decided to explore this topic more deeply, considering the execve scenario in the first place, since it has a strong security impact. It turned out that, for this particular case, LSM hooks alone are not solving the problem. This is because the kernel itself, does not need to store the string of the resolved filepath at all, and this is why structures like linux_binprm and its filename field still contains something that is coming from userspace, as the syscall arguments. But still, we have a chance to solve this problem!

The Linux kernel offers an interesting function called d_path, that can perform the symlink resolution for us. We can try to use it right away in our kernel module. Of course, this comes with a performance cost that we need to measure, but given the importance of this problem, I would be eager to pay it and make Falco more robust from a security standpoint. Then, I decided to explore the same problem in eBPF. It turned out that the same function bpf_d_path was recently introduced and can be used in eBPF programs. However, it cannot be used in every possible hook point: there is a very strict allowlist of hook points where it can be used, and more specifically it can't be used in the LSM hooks called upon execve (security_bprm_committing_creds). This leaves us with two options:

implement a custom function that performs the symlink resolution, cycling over the relevant kernel objects (dentrys, most likely). This is a very error-prone task, since we should take into account also mount namespaces, and it would also be very important to make it as fast as possible.
find another way to use bpf_d_path, that is the stable and reliable way to do it

By delving more into kernel code, I found a possible solution that I would like to discuss here. Basically, there are two observations to be done:

the security_file_open hook can call the bpf_d_path function.
there is a special field in task_struct called in_execve that was introduced specifically for LSMs to tell them whether the process is executing a new binary or not.

So the idea would be to:

hook in security_file_open and perform the symlink resolution if we are in execve
push the resolved path in a map for later retrieval in execve exit syscall, or other execve hook points that we decide
push to userspace the correct and resolved path

Of course, the map would represent the first example of "keeping state" in our probe, so it has to be discussed deeply.

By using both these observations, I was able to build a very rough PoC that seems to be working. I would like to use this issue to discuss about it and prepare action items with people interested into this topic. If we are going this way (that is hooking at security_file_open), we can think about using it also for solving symlinks when opening file, using this hooks for both purposes.

This is not all: solving symlinks like this deals also with another problem, which is detection for fileless execution, a notable attack technique very used in recent days. This is because when solving the executable path of memfd file descriptor, the kernel put the memfd: prefix to its name. To distinguish between real fileless execution and execution of files simply called starting with memfd strings, we can implement a new flag that performs the same exact checks like in this PR.

Hope we will find a way to work on this!

Action items:

[ ] identify minimum kernel version for the proposed solution
[ ] identify eBPF program to use: optimized kprobes like fentrys or normal eBPF LSM programs
[ ] TBD

loresuso commented 1 year ago

cc @falcosecurity/libs-maintainers

incertum commented 1 year ago

@loresuso fully support this proposal and here's why I love it.

Agree that having the option to pay the price or choose the features is important. Making it optional, at least initially, would be good.
You made a valid point about the need to address this issue for file opens. Using security_file_open as a starting point seems like the best approach overall, and it will push us to finally work on LSM hooks after contemplating it for a long time (❗😅 ).
It's inevitable that we will need to start keeping state in the driver in the long run. Let's begin finding an optimal and innovative approach, and this use case is a perfect opportunity to start.
I do have one concern: We rely on correlating two events. As a result, it may not be completely resilient for the process execution cases ... but perfect doesn't exist anyways ...

We have the first roadmap planning meeting next week, will propose to craft a 2 month timeline to get this end-to-end implemented including QA -> ETA end of July 2023 would be perfect if this works for everyone.

Andreagit97 commented 1 year ago

hi all! Finally, I should have some spare time to work on that, my idea is to proceed in the following way:

support symlink resolution in kernel module (should be the easiest one in this case) adding a new field in execve/at events
implement the logic in userspace facing ASAP possible blockers
implement it in the modern bpf (still to understand what is the best way fentry/LSM is cool but not AFAIK they are not supported on arm64/s390x (on arm64 yes but only in cutting edge kernel versions like 6.4). Technically fentry/fexit programs are the best choice in terms of perf and usability but we need to face this compatibility issue.
old probe is still a huge question mark, at some point in time we should start to discuss where we want to go with the old probe... We all know that introducing new features is every time a pain, in this case, would mean computing by the end the full path and I'm not completely sure we can address the paths involving mount points

incertum commented 1 year ago

Thank you @Andreagit97 !

Sounds good re kernel module. Re old probe, unfortunately it will still be needed for a while for many adopters ...

Re compatibility in general: We discussed that priority 1 is x86_64, then arm64, and other architectures are best effort support. Therefore, if something works on x86_64 and arm64 newer kernels, it could be an acceptable compromise?

Andreagit97 commented 1 year ago

Re compatibility in general: We discussed that priority 1 is x86_64, then arm64, and other architectures are best effort support. Therefore, if something works on x86_64 and arm64 newer kernels, it could be an acceptable compromise?

I would say yes, at least for the moment

Andreagit97 commented 1 year ago

Looking in sinsp we have these fields:

std::string m_comm; ///< Command name (e.g. "top")
std::string m_exe; ///< argv[0] (e.g. "sshd: user@pts/4")
std::string m_exepath; ///< full executable path

Where m_exepath is obtained using syscall parameters dirfd + pathname. Now the idea is to add a new field with the path resolved by the kernel:

std::string m_kernel_resolved_exepath ///< full resolved executable path derived directly from the drivers

We could also replace the actual m_exepath saving some time during the parsing phase since we don't have any more to compute it from syscall arguments, but it seems more valuable to also keep it to check if we are executing a symlink or not, WDYT? @incertum @loresuso @darryk10 @lrishi

incertum commented 1 year ago

@Andreagit97 on board. This would also enable us to not needing to rely on cwd to derive the absolute path of the executable, a win allover. Hence we could actually drop cwd from the execve* schema.

re keeping both. Maybe instead of keeping both we can add another exe flag in userspace (IS_SYMLINK)?

loresuso commented 1 year ago

Now the idea is to add a new field with the path resolved by the kernel: std::string m_kernel_resolved_exepath ///< full resolved executable path derived directly from the drivers

I like it! Thanks Andre. Agreed also that cwd loses a lot of its importance, since it's only used to compute the absolute path of the executable, and a simple flag IS_SYMLINK (set if the two member fields differ) would be nice to have. However, cwd is used in rules output almost everywhere, we have to keep it properly updated (so for instance, if we drop a chdir, the execve containing that info could still be valuable) in my opinion

Andreagit97 commented 1 year ago

re keeping both. Maybe instead of keeping both, we can add another exe flag in userspace (IS_SYMLINK)?

Uhm to know if it is a symlink or not we should compare the resolved path with the one computed from syscalls params, so probably we need to keep both m_exepath and m_kernel_resolved_exepath :thinking: Or do you have something else in mind? The only concern that I have here is that maybe our userspace reconstruction is not so reliable and could differ from our m_kernel_resolved_exepath even if there is no symlink, maybe we have just an error in the userspace reconstruction, so yes having a filter check is_symlink is supercool but we need to pay attention, probably we could match the last part of the resolved path with comm to understand if we have a symlink or not :thinking:

falcosecurity / libs

[FEATURE] Resolving executable path symlink on execve #1111