cilium / tetragon

eBPF-based Security Observability and Runtime Enforcement
https://tetragon.io
Apache License 2.0
3.56k stars 348 forks source link

cri-o 1.29 Tetragon not show namespace/pod information #2639

Open XelK opened 2 months ago

XelK commented 2 months ago

What happened?

It seems that with CRI-O 1.29, Tetragon does not visualize information about namespaces and pods.

Check the logs of the test container:

kubectl exec -ti -n tetragon tetragon-r292s -c tetragon -- tetra -d getevents -o compact|grep passwd

time="2024-06-24T14:54:53Z" level=debug msg="Processing event" event="process_exec:{process:{exec_id:\"ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjI4NzUyOTE2NjY1NDAxMzE6MzU1MzYzNw==\"  pid:{value:3553637}  uid:{}  cwd:\"/\"  binary:\"/bin/cat\"  arguments:\"/etc/passwd\"  flags:\"execve rootcwd clone\"  start_time:{seconds:1719240893  nanos:353654638}  auid:{value:4294967295}  parent_exec_id:\"ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjI4NzQ4NTUwMjk4NTEwNzk6MzU1Mjc0OQ==\"  tid:{value:3553637}}  parent:{exec_id:\"ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjI4NzQ4NTUwMjk4NTEwNzk6MzU1Mjc0OQ==\"  pid:{value:3552749}  uid:{}  cwd:\"/\"  binary:\"/bin/sh\"  flags:\"execve rootcwd clone\"  start_time:{seconds:1719240456  nanos:716961213}  auid:{value:4294967295}  parent_exec_id:\"ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjI4NzQ4NTUwMjM4Nzk4MTM6MzU1Mjc0Nw==\"  tid:{value:3552749}}}  node_name:\"mynode.local\"  time:{seconds:1719240893  nanos:353650458}"

🚀 process mynode.local /bin/cat /etc/passwd
time="2024-06-24T14:54:53Z" level=debug msg="Processing event" event="process_exit:{process:{exec_id:\"ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjI4NzUyOTE2NjY1NDAxMzE6MzU1MzYzNw==\"  pid:{value:3553637}  uid:{}  cwd:\"/\"  binary:\"/bin/cat\"  arguments:\"/etc/passwd\"  flags:\"execve rootcwd clone\"  start_time:{seconds:1719240893  nanos:353654638}  auid:{value:4294967295}  parent_exec_id:\"ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjI4NzQ4NTUwMjk4NTEwNzk6MzU1Mjc0OQ==\"  tid:{value:3553637}}  parent:{exec_id:\"ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjI4NzQ4NTUwMjk4NTEwNzk6MzU1Mjc0OQ==\"  pid:{value:3552749}  uid:{}  cwd:\"/\"  binary:\"/bin/sh\"  flags:\"execve rootcwd clone\"  start_time:{seconds:1719240456  nanos:716961213}  auid:{value:4294967295}  parent_exec_id:\"ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjI4NzQ4NTUwMjM4Nzk4MTM6MzU1Mjc0Nw==\"  tid:{value:3552749}}  time:{seconds:1719240893  nanos:354155578}}  node_name:\"mynode.local\"  time:{seconds:1719240893  nanos:354155575}"
💥 exit    mynode.local /bin/cat /etc/passwd 0"

Switching log to tracing mode I can see this messages:

tetragon time="2024-07-05T08:24:38Z" level=trace msg="process_exec: no container ID due to cgroup name not being a compatible ID, ignoring." cgroup.id=98592 cgroup.name=container process.binary=/bin/sh process.exec_id="ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjcyMjM0MTAxNzQ3OTIyOjIyNTA2NA=="

where exec_id is the same from:

{
    "process_exec": {
        "process": {
            "exec_id": "ZGtyLWNiLXV4aTQwMi5pdHRlc3QuY29ybmVyLmxvY2FsOjcyMjM0MTAxNzQ3OTIyOjIyNTA2NA==",
            "pid": 225064,
            "uid": 0,
            "cwd": "/",
            "binary": "/bin/sh",

I execute crictl insepct and crictl inspect <container_id>| grep cgroupsPath

and into cgroupsPath:

sudo find /sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod2263b521_f8ac_475e_82a0_95937cce8f0f.slice/ -type
d
/sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod2263b521_f8ac_475e_82a0_95937cce8f0f.slice/
/sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod2263b521_f8ac_475e_82a0_95937cce8f0f.slice/crio-13af5e5e8dd365f35cd40d268140600b80449b3c956c1ee257961ea51dfc1f74
/sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod2263b521_f8ac_475e_82a0_95937cce8f0f.slice/crio-4ff86f4fe0ebd2556606a6b049b94ae571e3486c4c265230dc8ce87f887ffd15.scope
/sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod2263b521_f8ac_475e_82a0_95937cce8f0f.slice/crio-4ff86f4fe0ebd2556606a6b049b94ae571e3486c4c265230dc8ce87f887ffd15.scope/container

Tetragon Version

1.1.2

Kernel Version

6.1.0-24

Kubernetes Version

1.29.2

Bugtool

No response

Relevant log output

No response

Anything else?

No response

kkourt commented 2 months ago

It seems that this version of cri-o uses paths such as:

/sys/fs/cgroup//kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod2263b521_f8ac_475e_82a0_95937cce8f0f.slice/crio-4ff86f4fe0ebd2556606a6b049b94ae571e3486c4c265230dc8ce87f887ffd15.scope/container

For the container cgroup that do not work well with tetragon. I think the best solution is to connect to the container runtime (https://github.com/kubernetes/cri-api/blob/c75ef5b/pkg/apis/runtime/v1/api.proto), get the cgroups used, and use the cgroup id to do the mapping. I think we would want to have an option to enable this.

There was a short discussion about this in the Tetragon Community Meeting (July 8th): https://docs.google.com/document/d/1BFMJLdtisiCSLfMct0GHof_ioL-5QVNLEaeMSlk_7Eo/edit#heading=h.cd9xm2lbvnw4.

kkourt commented 2 weeks ago

I'm reopening this to add instructions on how to use the features introduced from #2776