aquasecurity / tracee

Linux Runtime Security and Forensics using eBPF
https://aquasecurity.github.io/tracee/latest
Apache License 2.0
3.62k stars 418 forks source link

Kernel symbols are updated without required capabilities #4325

Open oshaked1 opened 1 month ago

oshaked1 commented 1 month ago

Description

The kernel symbol table utility relies on CAP_SYSLOG in order to be able to read from /proc/kallsyms and get actual addresses (otherwise addresses are zeroed-out).

When NewKernelSymbolTable is called when initializing tracee, the correct capability is held. But later calls to helpers like GetSymbolByName do not acquire CAP_SYSLOG, even though the helper may end up reading /proc/kallsyms again.

This causes lookups for symbols that were not initially added to t.requiredKsyms to produce unusable results (and no error is generated). An example for such a lookup occurs when fetching the address for attaching a kprobe belonging to an event that is needed as a dependency (and not selected directly).

The simplest way to reproduce this error is by deleting dist/signatures/builtin.so (because some signatures select events that list CAP_SYSLOG as a dependency), and running tracee without any arguments:

$ sudo dist/tracee > /dev/null
{"level":"warn","ts":1727174815.8476572,"msg":"libbpf: prog 'trace_load_elf_phdrs': failed to create kprobe '+0x0' perf event: Invalid argument"}

As can be seen, this program which is an (unrequired) dependency of sched_process_exec, is being loaded to address 0 and fails.

geyslan commented 1 month ago

I was able to reproduce it:

  1. rm dist/signatures/builtin.so
  2. sudo dist/tracee -o none
{"level":"warn","ts":1727195441.2103937,"msg":"libbpf: prog 'trace_load_elf_phdrs': failed to create kprobe '+0x0' perf event: Invalid argument"}

@oshaked1 are you working on solving this? LMK.

oshaked1 commented 1 month ago

I could solve it by acquiring the capability in the refresh method instead of relying on the callers to acquire it. Do you think that would be a good solution?

geyslan commented 1 month ago

I could solve it by acquiring the capability in the refresh method instead of relying on the callers to acquire it. Do you think that would be a good solution?

I think we opted to not acquire caps in inner functions since they can be called from other outers acquires, is that correct @NDStrahilevitz?

NDStrahilevitz commented 1 month ago

I could solve it by acquiring the capability in the refresh method instead of relying on the callers to acquire it. Do you think that would be a good solution?

I think we opted to not acquire caps in inner functions since they can be called for other outers acquires, is that correct @NDStrahilevitz?

I'm sorry @geyslan, I'm not sure I got what you mean. Could you rephrase?

Anyway, AFAIK events that require continuous use of ksyms declare the CAP_SYSLOG in their dependency such that it is included in the base ring. So before adding the capability inside the struct (which I would rather not do so that it is not coupled to the capabilities mechanism) we should:

  1. Confirm that all events which require SYSLOG declare that capability
  2. The dependency declaration does indeed add the capability to the base ring
  3. We have no inbetween calls (before adding the cap to the base ring and after the initial capability drop)
NDStrahilevitz commented 1 month ago

@oshaked1 as for the usecase you've described, searching the symbol of a kprobe attachment), I find it likely that we can add a capability ring increase around that code as it has a clearly defined scope.

geyslan commented 1 month ago

I'm sorry @geyslan, I'm not sure I got what you mean. Could you rephrase?

Acquiring caps in between caps calls.

But I believe you already answered with:

which I would rather not do so that it is not coupled to the capabilities mechanism

We have no inbetween calls (before adding the cap to the base ring and after the initial capability drop)

oshaked1 commented 1 month ago

@oshaked1 as for the usecase you've described, searching the symbol of a kprobe attachment), I find it likely that we can add a capability ring increase around that code as it has a clearly defined scope.

Yes it should be straightforward to do so. We should also add a disclaimer in the docstrings of the lookup functions because the capability requirement is unintuitive considering the symbol table was created prior to performing lookups.