falcosecurity / libs

libsinsp, libscap, the kernel module driver, and the eBPF driver sources
https://falcosecurity.github.io/libs/
Apache License 2.0
236 stars 165 forks source link

No support for Podman container activity capturing by container.id #1115

Open hashkeks opened 1 year ago

hashkeks commented 1 year ago

Hello,

I come from an issue over at the Sysdig repository where I was advised to open up an issue here: https://github.com/draios/sysdig/issues/385#issuecomment-1510891984

The problem is that Podman container activities still do not seem to be recognized by Sysdig when for example filtering by container.id=<container id>. Since - at least as far as I understand - Sysdig and Falco use the same engine to detect such activities and I am more familiar with Sysdig, I'll write down my Sysdig version. If there is anything I should also test using Falco, please feel free to tell me.

This problem occurs whether I run a Podman container with crun or runc and I tested it on two different systems with different software versions/systems (see below for version details). On Ubuntu 22.04.2 LTS and with the according software versions, no containers are recognized by sysdig -c lscontainers and no activity captured by sysdig evt.type=execve and container.id=<container-id>. Leaving out the container.id filter, execve activity from inside the container is captured. On Rocky Linux 8.8 and with the according software versions, a Podman container run with runc is recognized by sysdig -c lscontainers as container type Docker and with the right container ID. Container image and name are blank. Unfortunately it is not recognized anymore when run with crun and again no activity is captured by sysdig evt.type=execve and container.id=<container-id>. Leaving out the container.id filter, execve activity from inside the container is captured.

Ubuntu 22.04.2 LTS:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ sysdig --version
sysdig version 0.27.1

$ podman --version
podman version 3.4.4

$ runc --version
runc version 1.1.4-0ubuntu1~22.04.3
spec: 1.0.2-dev
go: go1.18.1
libseccomp: 2.5.3

$ crun --version
crun version 0.17
commit: 0e9229ae34caaebcb86f1fde18de3acaf18c6d9a
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL

Rocky Linux 8.8:

$ cat /etc/os-release
PRETTY_NAME="Rocky Linux 8.8 (Green Obsidian)"

$ sysdig --version
sysdig version 0.31.5

$ podman --version
podman version 4.4.1

$ runc --version
runc version 1.1.4
spec: 1.0.2-dev
go: go1.19.4
libseccomp: 2.5.2

$ crun --version
crun version 1.8.4
commit: 5a8fa99a5e41facba2eda4af12fa26313918805b
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL

Are Podman containers generally not supported or am I missing something out in my setup? The issue at https://github.com/draios/sysdig/issues/385 - where I originally come from - also does not seem to be fully resolved,

Thank you in advance for any hint towards a solution or a clarification regarding the support for Podman containers :)

incertum commented 1 year ago

Podman container engine should be supported by libsinsp, see https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/container_engine/docker/podman.cpp, however have not tried it myself. Can hopefully check on it in a few weeks.

Meanwhile, could you check on the socket path?

However, my strongest suspicion is that something with the cgroup retrieval is wrong as container.id is retrieved from the cgroup fetched in the kernel, hence it has nothing really to do with the container engine.

We have a test binary you can build from this repo directly https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/examples/test.cpp, often easier for testing, mind giving it a try, you can also specify the output fields in this example binary.

incertum commented 1 year ago

In addition, besides the versions you shared, if you can share your exact test setup you used to launch a test podman container, it would make it easier for us to try replicating potential issues. Asking because there can be subtle differences, for instance, when you launch a container using the containerd CLI called ctr directly things are not working, see https://github.com/falcosecurity/libs/pull/860#issuecomment-1416817658, but if you use crictl and launch containers in simulated sandboxes it all works locally, both cgroup resolution (which populates container.id) and adding the remaining container metadata such as container.image.repository.

incertum commented 1 year ago

Hi @hashkeks had a moment to run a podman container and made a few observations:


Part 1: Get container id from cgroups: working with proper settings

sudo systemctl start podman
netstat --listen | grep podman
unix  2      [ ACC ]     STREAM     LISTENING     448773   /run/podman/podman.sock

Part 2: Get container info based on cgroup: currently not working

This is where we get an error and we do not fetch the podman container image etc from the docker socket.

https://github.com/falcosecurity/libs/blame/031bc455ce03ac410dd03d9587e9b31cfc15ac60/userspace/libsinsp/container_engine/docker/async_source.cpp#L674

CC folks who touched docker_async_source::parse() lately @gnosek @deepskyblue86 @jasondellaluce @FedeDP . Ideas what might have changed or if we need a more proper refactor to continue supporting podman? For example making requests directly against the podman socket instead?

gnosek commented 1 year ago

@incertum,

For example making requests directly against the podman socket instead?

That's going to be a major effort but we'll probably have to bite the bullet at some point.

incertum commented 1 year ago

Agreed @gnosek, @leogr perhaps we should re-audit all container engines and check if the less used ones (like podman) even still work and then decide which ones we continue to support going forward? Plus check if there are emerging container runtimes we should start supporting ...

gnosek commented 1 year ago

@incertum,

I can assure you, podman is not one of the less used ones ;) We do require extra hoops with the Docker API socket but it's definitely used out there.

(we can probably kil rkt though :))

leogr commented 1 year ago

@incertum I agree we should review that part of the code. But, I would still keep broader container engines support if feasible (and just remove the deprecated one like rkt)

@gnosek Totally agree that podman is very used and that rkt should be removed.

cc'ing @FedeDP @Andreagit97 Since I had a similar discussion with them a couple of weeks ago

incertum commented 1 year ago

Great let's fix podman then. Anyone having more insights into whether the code is just slightly broken or if you need a very specific incantation wrt launching a podman container? In either case we should refactor it.

poiana commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 1 year ago

/remove-lifecycle stale

leogr commented 1 year ago

I'm working to revamp this. I'm assigning it to me for now /assign

(@Andreagit97 let me know if you want to help with this, if so, assign it to yourself, too pls)

poiana commented 9 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

leogr commented 9 months ago

/remove-lifecycle stale

FedeDP commented 8 months ago

/assign

leogr commented 6 months ago

not really related, but useful for podman cross refs: https://github.com/falcosecurity/libs/pull/1851

poiana commented 3 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

Andreagit97 commented 3 months ago

/remove-lifecycle stale

poiana commented 3 weeks ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

FedeDP commented 3 weeks ago

/remove-lifecycle stale