cilium / tetragon

eBPF-based Security Observability and Runtime Enforcement
https://tetragon.io
Apache License 2.0
3.63k stars 360 forks source link

Tetragon is not showing process exec ancestors #2420

Open alexeysofin opened 6 months ago

alexeysofin commented 6 months ago

What happened?

Tetragon version

time="2024-05-08T08:14:51Z" level=info msg="Starting tetragon" version=v1.0.3
time="2024-05-08T08:14:51Z" level=info msg="config settings" config="map[bpf-lib:/var/lib/tetragon/ btf: config-dir:/etc/tetragon/tetragon.conf.d/ cpuprofile: data-cache-size:1024 debug:false disable-kprobe-multi:false enable-export-aggregation:false enable-k8s-api:true enable-msg-handling-latency:false enable-pid-set-filter:false enable-pod-info:false enable-policy-filter:true enable-policy-filter-debug:false enable-process-ancestors:true enable-process-cred:false enable-process-ns:false event-queue-size:10000 export-aggregation-buffer-size:10000 export-aggregation-window-size:15s export-allowlist:{\"event_set\":[\"PROCESS_EXEC\", \"PROCESS_EXIT\", \"PROCESS_KPROBE\", \"PROCESS_UPROBE\", \"PROCESS_TRACEPOINT\"]} export-denylist:{\"namespace\":[\"\", \"cilium\", \"kube-system\"]} export-file-compress:false export-file-max-backups:5 export-file-max-size-mb:10 export-file-perm:600 export-file-rotation-interval:0s export-filename:/var/run/cilium/tetragon/tetragon.log export-rate-limit:-1 expose-kernel-addresses:false field-filters: force-large-progs:false force-small-progs:false gops-address:localhost:8118 k8s-kubeconfig-path: kernel: kmods:[] log-format:text log-level:info memprofile: metrics-label-filter:namespace,workload,pod,binary metrics-server::2112 netns-dir:/var/run/docker/netns/ pprof-addr: process-cache-size:65536 procfs:/procRoot rb-queue-size:65535 rb-size:0 rb-size-total:0 redaction-filters: release-pinned-bpf:true server-address:localhost:54321 tracing-policy: tracing-policy-dir:/etc/tetragon/tetragon.tp.d verbose:0]"

Kind version

kind version
kind v0.22.0 go1.21.3 linux/amd64

deployed using default helm.

if I start a pod with image debian:bookworm-slim, exec into the pod and run this bash script.

./script.sh

#!/bin/bash
set -e

response=$(timeout -s 15 5 curl google.com)
echo $response

I am not getting any ancestors in the log

{
    "process_exec": {
        "process": {
            "exec_id": "a2luZC13b3JrZXI6MjA4MDgwNTYwNTc1ODoyNzcyNg==",
            "pid": 27726,
            "uid": 0,
            "cwd": "/root",
            "binary": "/usr/bin/curl",
            "arguments": "google.com",
            "flags": "execve clone",
            "start_time": "2024-05-08T08:15:48.966646768Z",
            "auid": 4294967295,
            "pod": {
                "namespace": "default",
                "name": "test-pod",
                "container": {
                    "id": "containerd://796556cd4570c4a238358a8afc595698d23554e14348ecbe1ebf68c099efaadc",
                    "name": "test-pod",
                    "image": {
                        "id": "docker.io/library/debian@sha256:155280b00ee0133250f7159b567a07d7cd03b1645714c3a7458b2287b0ca83cb",
                        "name": "docker.io/library/debian:bookworm-slim"
                    },
                    "start_time": "2024-05-08T07:47:45Z",
                    "pid": 3141
                },
                "pod_labels": {
                    "run": "test-pod"
                },
                "workload": "test-pod",
                "workload_kind": "Pod"
            },
            "docker": "796556cd4570c4a238358a8afc59569",
            "parent_exec_id": "a2luZC13b3JrZXI6MjA4MDgwMjUwMjY0MzoyNzcyNQ==",
            "tid": 27726
        },
        "parent": {
            "exec_id": "a2luZC13b3JrZXI6MjA4MDgwMjUwMjY0MzoyNzcyNQ==",
            "pid": 27725,
            "uid": 0,
            "cwd": "/root",
            "binary": "/usr/bin/timeout",
            "arguments": "-s 15 5 curl google.com",
            "flags": "execve clone",
            "start_time": "2024-05-08T08:15:48.963545732Z",
            "auid": 4294967295,
            "pod": {
                "namespace": "default",
                "name": "test-pod",
                "container": {
                    "id": "containerd://796556cd4570c4a238358a8afc595698d23554e14348ecbe1ebf68c099efaadc",
                    "name": "test-pod",
                    "image": {
                        "id": "docker.io/library/debian@sha256:155280b00ee0133250f7159b567a07d7cd03b1645714c3a7458b2287b0ca83cb",
                        "name": "docker.io/library/debian:bookworm-slim"
                    },
                    "start_time": "2024-05-08T07:47:45Z",
                    "pid": 3140
                },
                "pod_labels": {
                    "run": "test-pod"
                },
                "workload": "test-pod",
                "workload_kind": "Pod"
            },
            "docker": "796556cd4570c4a238358a8afc59569",
            "parent_exec_id": "a2luZC13b3JrZXI6MjA4MDc5MDQxNjAwNDoyNzcyNA==",
            "tid": 27725
        }
    },
    "node_name": "kind-worker",
    "time": "2024-05-08T08:15:48.966644780Z"
}

is there something I'm doing wrong? This seems critical for more or less high-loaded clusters where container's health-checks can quickly overwhelm the log systems. In addition to that I think healthchecks can not filter out by ancestors either but at least we can do that by an intermediate filter system if the ancestors were there.

Tetragon Version

CLI version: v1.0.2

Kernel Version

Linux *** 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Server Version: v1.29.2

Bugtool

No response

Relevant log output

No response

Anything else?

No response

mtardy commented 5 months ago

Thanks for taking the time to open this issue. So you can see the process information and its parent in your event (if you get the event that contains the parent, or retrieve the information externally, you can rebuild an ancestor tree).

The process ancestry is a feature that is not available on the OSS version of Tetragon. May I ask where you saw mentions of this feature?

t0x01 commented 1 month ago

Hello.

Are there any plans to add process ancestry feature to tetragon in any foreseeable future? It really is very useful.

I've implemented my own version of it plus additional ancestor_binary_regex filter recently and so far it seems to be working fine. Not sure if my approach for it was optimal though, since i just basically added an optional loop to pkg/grpc/exec/exec.go. Not sure if i should create a PR as well, since it is a feature of the enterprise version.

alexeysofin commented 1 month ago

@mtardy

May I ask where you saw mentions of this feature?

Nowhere, but this is just obvious that in a more or less loaded cluster health checks will be 99% of events, happening thousands per second, and in addition there are go structures for ancestors, which are always empty.

So we ended up with a custom solution as well, but without forking tetragon as per @t0x01, but as a secondary process that tracks process trees and is injected into the data delivery pipeline.

t0x01 commented 1 month ago

Hello, @mtardy.

Just trying to make sure. Since this feature is available only in the Isovalent enterprise version of Tetragon, is it prohibited to add it to the open-source version or anyone can essentially propose required changes via a PR anyway? It is a very usefull feature to have for both observability and filtering purposes. As i mentioned earlier, i've implemented my own version of it recently and it seems to be working well enough. At least as far as i can tell.

What i've changed:

All changes can be found here. I'm not quite certain, where and how it can be improved as of now. Please let me know if these changes are allowed to be added to the open-source version of Tetragon, and if so is it required to add or change anything else before creating a PR. Thank you.

jrfastab commented 1 month ago

Please submit a PR the list looks good and I'll review it wen the PR exists, didn't look at the link yet I'm currently at Linux Plumbers Conference but can look when I get back in a few days. Whatever different folks have forked on or added on top of Tetragon doesn't impact what we should do in Tetragon. Assuming the code looks good and no one has technical arguments against it I say we can push it. Thanks!