aquasecurity / tracee

Linux Runtime Security and Forensics using eBPF
https://aquasecurity.github.io/tracee/latest
Apache License 2.0
3.56k stars 412 forks source link

Unbounded Memory Consumption #3698

Closed agadient closed 6 months ago

agadient commented 10 months ago

Description

A program that makes a significant number of successful open system calls causes tracee's memory usage to increase significantly to the point that it may be killed by the OOM killer. An example program that triggers this issue is provided here: https://github.com/Vali-Cyber/ebpf-attacks/tree/main/exhaust

Output of tracee version:

Tracee version: "v0.19.0"

Output of uname -a:

Linux hamden 5.15.0-87-generic #97-Ubuntu SMP Mon Oct 2 21:09:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Additional details

Contents of /etc/os-release

PRETTY_NAME="Ubuntu 22.04.1 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.1 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

rafaeldtinoco commented 10 months ago

@agadient could you share how you were running tracee ? which events being filtered. And the size of the machine you tested this ? If you could provide the cmdline how you ran it would be good (knowing if it was caching events or not, for example).

I'll get into this, but do you have a meminfo, for example, when this is happening (a meminfo and slabinfo would be useful). If you can't get it, ok, I'll try to get it myself soon. Want to differentiate if memory consumption comes from the kmalloc (slub) or if its caused by the runtime itself.

rafaeldtinoco commented 10 months ago

I see from the code:

// The function run by worker processes. It sets its CPU affinity, increments
// a counter, and opens a file that doesn't exist in an infinite loop.
void exhaust(uint64_t *counters, uint64_t counter_index) {
    // Set the CPU affinity. We launch one worker per CPU
    cpu_set_t set;
    CPU_ZERO(&set);
    CPU_SET(counter_index, &set);
    sched_setaffinity(getpid(), sizeof(cpu_set_t), &set);
    // Create a random filename in the tmp directory
    uint64_t filename_size = 16;
    if (TARGET_IS_TRACEE) {
        filename_size = NAME_MAX;
    }

That, because of Arg Filtering, one could (as we already knew), fulfil the pipeline with events and exhaust internal states (or maps keeping states between the eBPF and Go logic).

@AlonZivony I didnt read the code yet but from a first look it is related to the fact that we're arg filtering too late (so if the current running policy filtering is too broad, one could do that).

The same happens for the entire pipeline concept actually. If one stress the amount of events through the perf buffer, we could lose a detection. Stressing amount of events in the pipeline depends on which events are enabled by default (and if they are filtering in kernel (like scopes) or in userland (current arg filtering)).

The fix for this type of thing is to have the in-kernel filtering for arguments (as we've spoken in a recent past). Only needs prioritization.

NDStrahilevitz commented 10 months ago

@rafaeldtinoco One counterpoint to filtering being the solution is that any agent aware of tracee could easily bypass the filters, in fact this program which randomizes filename does exactly that, unless we ignore /tmp entirely. Kernel filtering is important for cases where the admin can control what runs on the cluster and tune tracee accordingly, which is of course very important, but it's not the whole story IMO.

rafaeldtinoco commented 10 months ago

The fix for this type of thing is to have the in-kernel filtering for arguments (as we've spoken in a recent past). Only needs prioritization.

I wrote this too fast =) and did not mention its not a full fix, nor would get rid of the problem, maybe a 'helper' only.

Kernel filtering is important for cases where the admin can control what runs on the cluster and tune tracee accordingly, which is of course very important, but it's not the whole story IMO.

I have not thought otherwise, and you are correct. By having in kernel filtering we would at least guarantee we are not spammed with things we don't want in userland, but it wouldn't be "a answer".

Event type quota, de-prioritization of events, etc, could all be answers but there is always room for mixing real events with fake ones in an attack. I think the answer will be having signatures to detect attacks to tracee =). This way, we could miss the real attack but the attempt to taint tracee would be picked (and that could be 'good enough' for end-user).

agadient commented 10 months ago

@rafaeldtinoco this is the command I used to run Tracee: docker run -it --pid=host --cgroupns=host --privileged -v /etc/os-release:/etc/os-release-host:ro -v /boot:/boot:ro aquasec/tracee:0.19.0

trvll commented 6 months ago

@agadient we have tested this issue with different versions of exhaust program you've shared, changing the counter_value and so far didn't get any OOM kill. The Tracee memory consumption never goes up than 6% of the total available. Do you have any other clarification/insights in order to reproduce that?

agadient commented 6 months ago

Hi @trvll! Did you try running it with ./exhaust -tracee? The flag is important because I only saw this behavior with tracee when the files being created and deleted existed. Also, try removing all files from the /tmp directory before you run the program.

trvll commented 6 months ago

Hi @trvll! Did you try running it with ./exhaust -tracee?

yes, sure... also have tried with empty /tmp and with existing files as well... anything else we should try?

agadient commented 6 months ago

I just reproduced the issue following these steps. I attached a screenshot of top:

  1. Download Ubuntu 22.04.4 LTS ISO from here: https://ubuntu.com/download/server.
  2. Set up a VM with 2CPUs, 2GB of RAM, 20 GB of disk.
  3. When installing Ubuntu, make sure to install SSH.
  4. SSH into the VM after the install is complete.
  5. Install git and g++. sudo apt install git g++.
  6. Install the latest docker following these instructions: https://docs.docker.com/engine/install/ubuntu/
  7. Clone the repo: git clone https://github.com/Vali-Cyber/ebpf-attacks.git
  8. Compile the exhaust.cpp program: cd ebpf-attacks/exhaust && ./build.sh
  9. Run tracee: docker run --name tracee --rm -it --pid=host --cgroupns=host --privileged -v /etc/os-release:/etc/os-release-host:ro -v /boot:/boot:ro aquasec/tracee:latest. Give it some time to initialize.
  10. Run exhaust: ./exhaust -tracee
Screenshot 2024-03-14 at 9 05 21 AM
trvll commented 6 months ago

It appears that the behavior you're experiencing is due to a misconfiguration related to how Tracee handles its default settings when no specific arguments are provided.

Understanding Default Behavior

By design, when Tracee is invoked without any arguments, it initializes with a predefined set of default arguments to ensure a base level of functionality. This is intended to make the tool immediately useful for typical use cases without requiring initial configuration by the user.

from docker image entrypoint.sh

run_tracee() {
    mkdir -p $TRACEE_OUT

    if [ $# -ne 0 ]; then
        # no default arguments, just given ones
        $TRACEE_EXE "$@"
    else
        # default arguments
        $TRACEE_EXE \
        --metrics \
        --cache cache-type=mem \
        --cache mem-cache-size=512 \
        --capabilities bypass=$CAPABILITIES_BYPASS \
        --capabilities add=$CAPABILITIES_ADD \
        --capabilities drop=$CAPABILITIES_DROP \
        --output=json \
        --output=option:parse-arguments \
        --output=option:relative-time \
        --events signatures,container_create,container_remove
    fi

    tracee_ret=$?
}

As noted, the default configuration includes setting the memory cache size for events at 512 MB.

Addressing Memory Constraints in VM Environments

Given that your VM is configured with only 2GB of memory, allocating 512 MB for Tracee's event caching could lead to resource contention, affecting both Tracee's performance and that of other processes running on the VM. To mitigate this, consider specifying a smaller cache size that better fits your VM's memory constraints.

Suggested Solution

You can override the default cache size by specifying the --cache cache-type=mem and --cache mem-cache-size flags [1] when running Tracee. For instance, to reduce the memory cache size to 128 MB, you could use the following command:

docker run --name tracee --rm -it --pid=host --cgroupns=host --privileged -v /etc/os-release:/etc/os-release-host:ro -v /boot:/boot:ro aquasec/tracee:latest --metrics --cache cache-type=mem --cache mem-cache-size=128 --capabilities bypass=0 --capabilities add= --capabilities drop= --output=json --output=option:parse-arguments --output=option:relative-time --events signatures,container_create,container_remove

agadient commented 6 months ago

@trvll I tested this configuration and confirmed that Tracee is no longer killed by the OOM killer. It would be nice if Tracee automatically checked the system's memory and adjusted its own limits accordingly. Perhaps this feature is something your team can consider at some point in the future.

yanivagman commented 6 months ago

@trvll I tested this configuration and confirmed that Tracee is no longer killed by the OOM killer. It would be nice if Tracee automatically checked the system's memory and adjusted its own limits accordingly. Perhaps this feature is something your team can consider at some point in the future.

Agree. I opened an issue to track that: https://github.com/aquasecurity/tracee/issues/3947

Going to close this one now.