Closed Andreagit97 closed 1 year ago
In the last weeks, I've done some investigation around this issue, I would be happy to lead this workflow if we think this is something we are interested in :)
Have been running some experiments on my fedora dev box 6.2.15-100.fc36.x86_64
using a minikube test pod kubectl run mynginxapp --image=nginx:latest --port=80
... I am getting broken process trees both on master and on the find_reaper PR branch :thinking:
Would you mind checking on these examples w/ sinsp-example output?
sudo ./libsinsp/examples/sinsp-example -b driver/bpf/probe.o -f "(evt.type in (execve, execveat) and evt.dir=< and container.id!=host and not proc.cmdline contains ip)" -j -o "*%evt.type %proc.cmdline %proc.exepath %proc.pexepath %proc.aexepath[2] %proc.aexepath[3] %proc.aexepath[4] %proc.aexepath[5] %proc.aexepath[6] %proc.aexepath[7] %container.id %container.image.repository %proc.pid %proc.ppid %proc.apid[2] %proc.apid[3] %proc.apid[4] %proc.apid[5] %proc.apid[6] %proc.apid[7] %proc.pcmdline %proc.acmdline[2] %proc.vpgid"
Your example
kubectl exec mynginxapp -- sh -c 'tail -f /proc/self/status'
on find_reaper branch:
{"container.id":"82ca7e98db68","container.image.repository":"docker.io/library/nginx","evt.type":"execve","proc.acmdline[2]":"runc --root [TRUNCATED]","proc.aexepath[2]":"/usr/bin/runc","proc.aexepath[3]":"/usr/bin/runc","proc.aexepath[4]":"/usr/bin/containerd-shim-runc-v2","proc.aexepath[5]":"/usr/lib/systemd/systemd","proc.aexepath[6]":null,"proc.aexepath[7]":null,"proc.apid[2]":418262,"proc.apid[3]":418262,"proc.apid[4]":407766,"proc.apid[5]":1,"proc.apid[6]":null,"proc.apid[7]":null,"proc.cmdline":"sh -c tail -f /proc/self/status","proc.exepath":"/usr/bin/sh","proc.pcmdline":"runc --root [TRUNCATED]","proc.pexepath":"/usr/bin/runc","proc.pid":418273,"proc.ppid":418266,"proc.vpgid":174}
{"container.id":"82ca7e98db68","container.image.repository":"docker.io/library/nginx","evt.type":"execve","proc.acmdline[2]":null,"proc.aexepath[2]":null,"proc.aexepath[3]":null,"proc.aexepath[4]":null,"proc.aexepath[5]":null,"proc.aexepath[6]":null,"proc.aexepath[7]":null,"proc.apid[2]":null,"proc.apid[3]":null,"proc.apid[4]":null,"proc.apid[5]":null,"proc.apid[6]":null,"proc.apid[7]":null,"proc.cmdline":"tail -f /proc/self/status","proc.exepath":"/usr/bin/tail","proc.pcmdline":"sh -c tail -f /proc/self/status","proc.pexepath":"/usr/bin/sh","proc.pid":418279,"proc.ppid":418273,"proc.vpgid":174}
or try
kubectl exec -it mynginxapp -- env KEY=123 /bin/bash
/# sleep 200
output:
{"container.id":"82ca7e98db68","container.image.repository":"docker.io/library/nginx","evt.type":"execve","proc.acmdline[2]":"runc --root [TRUNCATED]","proc.aexepath[2]":"/usr/bin/runc","proc.aexepath[3]":"/usr/bin/runc","proc.aexepath[4]":"/usr/bin/containerd-shim-runc-v2","proc.aexepath[5]":"/usr/lib/systemd/systemd","proc.aexepath[6]":null,"proc.aexepath[7]":null,"proc.apid[2]":418799,"proc.apid[3]":418799,"proc.apid[4]":407766,"proc.apid[5]":1,"proc.apid[6]":null,"proc.apid[7]":null,"proc.cmdline":"env KEY=123 /bin/bash","proc.exepath":"/usr/bin/env","proc.pcmdline":"runc --root [TRUNCATED]","proc.pexepath":"/usr/bin/runc","proc.pid":418810,"proc.ppid":418803,"proc.vpgid":188}
{"container.id":"82ca7e98db68","container.image.repository":"docker.io/library/nginx","evt.type":"execve","proc.acmdline[2]":null,"proc.aexepath[2]":null,"proc.aexepath[3]":null,"proc.aexepath[4]":null,"proc.aexepath[5]":null,"proc.aexepath[6]":null,"proc.aexepath[7]":null,"proc.apid[2]":null,"proc.apid[3]":null,"proc.apid[4]":null,"proc.apid[5]":null,"proc.apid[6]":null,"proc.apid[7]":null,"proc.cmdline":"bash","proc.exepath":"/bin/bash","proc.pcmdline":null,"proc.pexepath":null,"proc.pid":418810,"proc.ppid":418803,"proc.vpgid":188}
{"container.id":"82ca7e98db68","container.image.repository":"docker.io/library/nginx","evt.type":"execve","proc.acmdline[2]":null,"proc.aexepath[2]":null,"proc.aexepath[3]":null,"proc.aexepath[4]":null,"proc.aexepath[5]":null,"proc.aexepath[6]":null,"proc.aexepath[7]":null,"proc.apid[2]":null,"proc.apid[3]":null,"proc.apid[4]":null,"proc.apid[5]":null,"proc.apid[6]":null,"proc.apid[7]":null,"proc.cmdline":"sleep 200","proc.exepath":"/usr/bin/sleep","proc.pcmdline":"bash","proc.pexepath":"/bin/bash","proc.pid":418818,"proc.ppid":418810,"proc.vpgid":194}
We may have more problems, not just the broken lineage but also these edge cases of replacing the pid ...
uhm please note that this PR https://github.com/falcosecurity/libs/pull/1151 only adds the support kernel side, but the reaper_pid info is not used in userspace so the process lineage will be broken as in master, you are actually testing the master logic in both cases, and that's good at least we can see how much is broken is today :/
lol of course this is what happens when you re-open your laptop on a Friday night for some hacking.
Tried again, first I noticed the bpf_printk statements appeared a bit delayed and not consistent in timing across attempts, basically I had to drop out of the shell to have them displayed. Therefore yes I would be curious to test drive the userspace logic.
Do you mind checking again on this example and let us know if everything works as expected?
kubectl exec -it mynginxapp -- env KEY=123 /bin/bash
root@mynginxapp:/# sleep 10
root@mynginxapp:/# cat test
root@mynginxapp:/# exit
process tree: 1:/usr/lib/systemd/systemd -> 407766:usr/bin/containerd-shim-runc-v2 -> 578560:/usr/bin/runc -> 578560:/usr/bin/runc -> 578560:/usr/bin/runc -> 578566:/usr/bin/runc -> 578569:/usr/bin/env -> 578569:/bin/bash -> 578579:/usr/bin/sleep
{"container.id":"82ca7e98db68","container.image.repository":"docker.io/library/nginx","evt.type":"execve","proc.aexepath[2]":"/usr/bin/runc","proc.aexepath[3]":"/usr/bin/runc","proc.aexepath[4]":"/usr/bin/runc","proc.aexepath[5]":"/usr/bin/containerd-shim-runc-v2","proc.aexepath[6]":"/usr/lib/systemd/systemd","proc.aexepath[7]":null,"proc.apid[2]":578560,"proc.apid[3]":578560,"proc.apid[4]":578560,"proc.apid[5]":407766,"proc.apid[6]":1,"proc.apid[7]":null,"proc.cmdline":"env KEY=123 /bin/bash","proc.exepath":"/usr/bin/env","proc.pexepath":"/usr/bin/runc","proc.pid":578569,"proc.ppid":578566,"proc.sid":271,"proc.tty":34817,"proc.vpgid":271}
{"container.id":"82ca7e98db68","container.image.repository":"docker.io/library/nginx","evt.type":"execve","proc.aexepath[2]":null,"proc.aexepath[3]":null,"proc.aexepath[4]":null,"proc.aexepath[5]":null,"proc.aexepath[6]":null,"proc.aexepath[7]":null,"proc.apid[2]":null,"proc.apid[3]":null,"proc.apid[4]":null,"proc.apid[5]":null,"proc.apid[6]":null,"proc.apid[7]":null,"proc.cmdline":"bash","proc.exepath":"/bin/bash","proc.pexepath":null,"proc.pid":578569,"proc.ppid":578566,"proc.sid":271,"proc.tty":34817,"proc.vpgid":271}
{"container.id":"82ca7e98db68","container.image.repository":"docker.io/library/nginx","evt.type":"execve","proc.aexepath[2]":null,"proc.aexepath[3]":null,"proc.aexepath[4]":null,"proc.aexepath[5]":null,"proc.aexepath[6]":null,"proc.aexepath[7]":null,"proc.apid[2]":null,"proc.apid[3]":null,"proc.apid[4]":null,"proc.apid[5]":null,"proc.apid[6]":null,"proc.apid[7]":null,"proc.cmdline":"sleep 10","proc.exepath":"/usr/bin/sleep","proc.pexepath":"/bin/bash","proc.pid":578579,"proc.ppid":578569,"proc.sid":271,"proc.tty":34817,"proc.vpgid":277}
{"container.id":"82ca7e98db68","container.image.repository":"docker.io/library/nginx","evt.type":"execve","proc.aexepath[2]":null,"proc.aexepath[3]":null,"proc.aexepath[4]":null,"proc.aexepath[5]":null,"proc.aexepath[6]":null,"proc.aexepath[7]":null,"proc.apid[2]":null,"proc.apid[3]":null,"proc.apid[4]":null,"proc.apid[5]":null,"proc.apid[6]":null,"proc.apid[7]":null,"proc.cmdline":"cat test","proc.exepath":"/usr/bin/cat","proc.pexepath":"/bin/bash","proc.pid":578598,"proc.ppid":578569,"proc.sid":271,"proc.tty":34817,"proc.vpgid":278}
runc-578566 [012] d..3. 718172.632598: bpf_trace_printk: reaper_pid 578560 /usr/bin/runc
runc-578560 [008] d..3. 718172.633466: bpf_trace_printk: reaper_pid 407766 /usr/bin/containerd-shim-runc-v2
Tried again, first I noticed the bpf_printk statements appeared a bit delayed and not consistent in timing across attempts, basically I had to drop out of the shell to have them displayed.
It really depends on how you put them in the code, if you put them in the hot path, and there are a lot of calls it's perfectly fine to see some weird things... could you provide an example?
Do you mind checking again on this example and let us know if everything works as expected?
Sure i will take a look ASAP
Describe the bug
Starting from a thread we cannot correctly traverse its ancestors for 2 main reasons:
ptid
of a thread is populated with thetid
of the caller, so the parent of a thread is its caller... but this is not what the kernel does, and when a thread in the thread group dies we lose the parent information.PROCEXIT
is processed. So when a thread dies we break the process lineage.How to reproduce it
nginx
podNow let's look at the process lineage of the tail process:
libs master
ideal solution, obtained with
pstree
As you can notice we lose all the parents after the
[sh]
process and moreover the ptid ofsh
is wrong, because as we said we setptid=caller
Possible Solution
To solve this issue we need different steps:
CLONE_PARENT
flag or theCLONE_NEW_PID_NS
flag.thread_group
in sinsp, so we can always know how many threads we have in the thread group. This is useful for the reparenting logic (next step) and more in general we could exploit it as additional information during the captureprctl
syscall. The instrumentation is already there./proc
scan we are not able to recover the reaper info from the kernel and so we need a way to recover it during the run-time capture.