Unable to detect events with containerd and kubernetes

rayanebel commented 3 years ago

Describe the bug

Hi everyone,

I'm not sure if it's a bug or a misunderstanding from my side. I have a kubeadm cluster installed and i switch from docker to containerd because of the deprecation of docker.

I've installed falco directly on the worker node (not as a deamonset) as the documentation recommend. I've update the systemd unit file to add the reference needed when we use containerd. Everything is up and running but now, when I try to create a pod and try to exec a shell on it I have no entry in the log which tell me that someone has spanwed a shell in pod xxxx.

Do I miss something ? I do not find in the documentation a lot of documentation about how to use falco with containerd.

Here you can find my systemd unit file.

[Unit]
Description=Falco: Container Native Runtime Security
Documentation=https://falco.org/docs/
[Service]
Environment="FALCO_ARGS=--cri=/run/containerd/containerd.sock --disable-cri-async -pk"
Type=simple
User=root
ExecStartPre=/sbin/modprobe falco
ExecStart=/usr/bin/falco --pidfile=/var/run/falco.pid $FALCO_ARGS
ExecStopPost=/sbin/rmmod falco
UMask=0077
TimeoutSec=30
RestartSec=15s
Restart=on-failure
PrivateTmp=true
NoNewPrivileges=yes
ProtectHome=read-only
ProtectSystem=full
ProtectKernelTunables=true
RestrictRealtime=true
RestrictAddressFamilies=~AF_PACKET
[Install]
WantedBy=multi-user.target

When I try to do another action for example I open a shell on a pod and I try to create a file in /etc I found some entries in the logs but some data are missing

Apr 23 11:37:35 cks-worker falco[8815]: 11:37:35.770964137: Error File below /etc opened for writing (user=root user_loginuid=-1 command=touch /etc/bbb parent=bash pcmdline=bash file=/etc/bbb program=touch gparent=<NA> ggparent=<NA> gggparent=<NA
> container_id=host image=<NA>)
Apr 23 11:37:35 cks-worker falco[8815]: 11:37:35.770964137: Error File below /etc opened for writing (user=root user_loginuid=-1 command=touch /etc/bbb parent=bash pcmdline=bash file=/etc/bbb program=touch gparent=<NA> ggparent=<NA> gggparent=<NA
> container_id=host image=<NA>)

Additional info: I'm using the default rules file.

How to reproduce it

Install falco in the worker node
Update the systemd unit file to include configuration related to containerd
Create a pod

apiVersion: v1
kind: Pod
metadata:
  name: falco-test
  namespace: default
spec:
  containers:
  - image: nginx
    name: falco-ltest

exec a shell on the pod
```
kubectl exec -it falco-test -- bash
```
check the falco logs journalctl -fu falco (No events when a shell has been spawned)
Inside the pod create a file in /etc and check the logs again (Events with incomplete data)

Expected behaviour

We should have a event when we open a shell in the pod. It's working when we are using docker as container runtime.
We should have events with all the metadata _(podname...).

Environment

Falco version:

Falco version: 0.28.0
Driver version: 5c0b863ddade7a45568c0ac97d037422c9efb750

System info:
Cloud provider : GCP

OS:

NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

Kernel: Linux cks-master 5.4.0-1042-gcp #45~18.04.1-Ubuntu SMP Tue Apr 13 18:51:16 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Installation method: DEB

leogr commented 3 years ago

It seems Falco is not talking with containerd (which is supported by Falco).

Have you tried to run Falco (just for debugging purposes) manually?

For example:

sudo /usr/bin/falco --cri /run/containerd/containerd.sock -pk

Do you still get the same problem?

rayanebel commented 3 years ago

Hi @leogr

Thanks for your reply.

I have stop the falco systemd service and run falco manually with the command that you provide me. When, I create a pod and exec a shell on it I have an log in stdout but, some metadata are missing (e.g k8s.ns and k8s.pod)

sudo /usr/bin/falco --cri /run/containerd/containerd.sock -pk

Tue May 11 16:36:25 2021: Falco version 0.28.0 (driver version 5c0b863ddade7a45568c0ac97d037422c9efb750)
Tue May 11 16:36:25 2021: Falco initialized with configuration file /etc/falco/falco.yaml
Tue May 11 16:36:25 2021: Loading rules from file /etc/falco/falco_rules.yaml:
Tue May 11 16:36:25 2021: Loading rules from file /etc/falco/falco_rules.local.yaml:
Tue May 11 16:36:26 2021: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Tue May 11 16:36:26 2021: Starting internal webserver, listening on port 8765

16:39:38.752534563: Notice A shell was spawned in a container with an attached terminal (user=root user_loginuid=-1
 k8s.ns=<NA> k8s.pod=<NA> container=e22578a37c49 shell=bash parent=runc cmdline=bash terminal=34816 container_id=e2
2578a37c49 image=docker.io/library/nginx) k8s.ns=<NA> k8s.pod=<NA> container=e22578a37c49 k8s.ns=<NA> k8s.pod=<NA> 
container=e22578a37c49

16:39:57.071900639: Error File below /etc opened for writing (user=root user_loginuid=-1 command=touch /etc/toto pa
rent=bash pcmdline=bash file=/etc/toto program=touch gparent=<NA> ggparent=<NA> gggparent=<NA> container_id=e22578a
37c49 image=docker.io/library/nginx) k8s.ns=<NA> k8s.pod=<NA> container=e22578a37c49 k8s.ns=<NA> k8s.pod=<NA> conta
iner=e22578a37c49 k8s.ns=<NA> k8s.pod=<NA> container=e22578a37c49

So what the difference with the systemd service ? I think I have provided all the flag related to containerd ?

ps fax | grep "falco"
26929 pts/0    S+     0:00  |       |           \_ grep --color=auto falco
23599 pts/1    S+     0:00          |           \_ journalctl -fu falco
26760 ?        Ssl    0:01 /usr/bin/falco --pidfile=/var/run/falco.pid --cri=/run/containerd/containerd.sock --disable-cri-async -pk

Do we have to make a special configuration in containerd ?

leogr commented 3 years ago

Now I see the container ID and the image name, that's what I expected (so if it works by running Falco manually, there should be some issue in the systemd unit file, perhaps because FALCO_AGRS's value is not quoted).

Regarding the missing k8s.ns and k8s.pod: those are fetched from the k8s API sever, so to get those you also have to configure Falco to communicate with k8s, by using the following command-line flags:

-k <url>, --k8s-api <url>
                               Enable Kubernetes support by connecting to the API server specified as argument.
                               E.g. "http://admin:password@127.0.0.1:8080".
                               The API server can also be specified via the environment variable FALCO_K8S_API.
 -K <bt_file> | <cert_file>:<key_file[#password]>[:<ca_cert_file>], --k8s-api-cert <bt_file> | <cert_file>:<key_file[#password]>[:<ca_cert_file>]
                               Use the provided files names to authenticate user and (optionally) verify the K8S API server identity.
                               Each entry must specify full (absolute, or relative to the current directory) path to the respective file.
                               Private key password is optional (needed only if key is password protected).
                               CA certificate is optional. For all files, only PEM file format is supported. 
                               Specifying CA certificate only is obsoleted - when single entry is provided 
                               for this option, it will be interpreted as the name of a file containing bearer token.
                               Note that the format of this command-line option prohibits use of files whose names contain
                               ':' or '#' characters in the file name.

Also, this deployment manifests https://github.com/falcosecurity/evolution/blob/8edd1ad0f001f16c444b3e62f611f21bca49145b/deploy/kubernetes/falco/templates/daemonset.yaml#L45-L52 shows how to configure those values, I hope it may be useful as an example.

rayanebel commented 3 years ago

Hi @leogr
I carried out the same steps below and re-execute the falco command manually on a fresh cluster (GCE+kubeadm) but, I have again the same bug. I'm not able to see any events. I don't understand why it was working when we talked last time. I've just installed containerd with default configuration and install falco.

I tried to run the command manually (systemd service stopped) and has we can see each events that I can see are labeled as container=host.

root@cks-worker:/etc/containerd# sudo /usr/bin/falco --cri /run/containerd/containerd.sock -pk
Mon Jun 14 08:47:23 2021: Falco version 0.28.1 (driver version 5c0b863ddade7a45568c0ac97d037422c9efb750)
Mon Jun 14 08:47:23 2021: Falco initialized with configuration file /etc/falco/falco.yaml
Mon Jun 14 08:47:23 2021: Loading rules from file /etc/falco/falco_rules.yaml:
Mon Jun 14 08:47:24 2021: Loading rules from file /etc/falco/falco_rules.local.yaml:
Mon Jun 14 08:47:24 2021: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Mon Jun 14 08:47:24 2021: Starting internal webserver, listening on port 8765

08:48:25.591044983: Warning Shell history had been deleted or renamed (user=root user_loginuid=-1 type=openat command=bash fd.name=/root/.bash_history name=/root/.bash_history path=<NA> oldpath=<NA> k8s.ns=<NA> k8s.pod=<NA> container=host) k8s.ns
=<NA> k8s.pod=<NA> container=host k8s.ns=<NA> k8s.pod=<NA> container=host
08:49:19.167578644: Error File below /etc opened for writing (user=root user_loginuid=-1 command=touch toto parent=bash pcmdline=bash file=/etc/toto program=touch gparent=<NA> ggparent=<NA> gggparent=<NA> container_id=host image=<NA>) k8s.ns=<NA>
 k8s.pod=<NA> container=host k8s.ns=<NA> k8s.pod=<NA> container=host k8s.ns=<NA> k8s.pod=<NA> container=host

to install containerd I've follow this guide: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd

to install kubeadm I've followed this guide: https://k8s-school.fr/resources/fr/blog/kubeadm/

rayanebel commented 3 years ago

@leogr Now, just for debugging purpose, I tried to install falco by using helm and I get the same results. falco fallback to container=host when I'm trying to generate events.

09:37:04.037899240: Error File below /etc opened for writing (user=root user_loginuid=-1 command=touch test2 parent=bash pcmdline=bash file=/etc/test2 program=touch gparent=<NA> ggparent=<NA> gggparent=<NA> container_id=host image=<
NA>) k8s.ns=<NA> k8s.pod=<NA> container=host k8s.ns=<NA> k8s.pod=<NA> container=host
09:38:53.075060899: Error File below /etc opened for writing (user=root user_loginuid=-1 command=touch /etc/test parent=bash pcmdline=bash file=/etc/test program=touch gparent=<NA> ggparent=<NA> gggparent=<NA> container_id=host image=<NA>) k8s.ns
=<NA> k8s.pod=<NA> container=host k8s.ns=<NA> k8s.pod=<NA> container=host

helm install falco falcosecurity/falco --set ebpf.enabled=true -n falco

Something went wrong with containerd but I don't understand why ? Do you have other steps to do to troubleshoot this problem ?

@leogr If you need I can give you an access to the cluster or the VM (it's a sandbox) if it's better for you.

leogr commented 3 years ago

helm install falco falcosecurity/falco --set ebpf.enabled=true -n falco

Something went wrong with containerd but I don't understand why ? Do you have other steps to do to troubleshoot this problem ?

Quick one:

Is the containerd unix socket located in /run/containerd/containerd.sock on your system ?

@leogr If you need I can give you an access to the cluster or the VM (it's a sandbox) if it's better for you.

Please contact me via DM on slack.

antonioribezzi commented 3 years ago

Hi, we're encountering the same issue. We're running Anthos 1.8 on VMware, and we switched to Containerd. Since we switched to Containerd, Falco is not able to detect privileged containers anymore:

{"output":"09:51:04.431861217: Notice DEBUG Privileged container started (user=root user_loginuid=-1 command=dockerd --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2376 --tlsverify --tlscacert /certs/server/ca.pem --tlscert /certs/server/cert.pem --tlskey /certs/server/key.pem k8s.ns=default k8s.pod=dind container=6b068625c0e4 image=nexus.intra.stzh.ch:18080/x86_64/stzh/docker/docker:19.03.12-dind-1 container.Privileged=false)","priority":"Notice","rule":"DEBUG Launch Privileged Container","time":"2021-07-23T09:51:04.431861217Z", "output_fields": {"container.id":"6b068625c0e4","container.image.repository":"nexus.intra.stzh.ch:18080/x86_64/stzh/docker/docker","container.image.tag":"19.03.12-dind-1","container.privileged":false,"evt.time":1627033864431861217,"k8s.ns.name":"default","k8s.pod.name":"dind","proc.cmdline":"dockerd --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2376 --tlsverify --tlscacert /certs/server/ca.pem --tlscert /certs/server/cert.pem --tlskey /certs/server/key.pem","user.loginuid":-1,"user.name":"root"}}

We're using Calico as CNI, and Falco runs like that: /usr/bin/falco --cri /run/containerd/containerd.sock -K /var/run/secrets/kubernetes.io/serviceaccount/token -k https://100.126.1.1 -pk

We installed Falco using the Helm chart, not directly on the nodes.

holyspectral commented 3 years ago

I noticed this issue too when containerd or crio is used. It looks like the logic in falco-libs doesn't support the recent containerd and crio . I'm going to work on a fix.

wuestkamp commented 3 years ago

I'm seeing the same issue like @rayanebel where all metadata is missing. Tested with containerd (1.3.3|1.5.2) and Falco (0.21.0|0.28.0|0.29.1).

program=sed gparent=<NA> ggparent=<NA> gggparent=<NA> container_id=host image=<NA>) k8s.ns=<NA> k8s.pod=<NA> container=host k8s.ns=<NA> k8s.pod=<NA> container=host k8s.ns=<NA> k8s.pod=<NA> container=host

@holyspectral was your PR about this issue or just about Falco not detecting privileged containers?

Anyone knows if there is a workaround for Falco+Containerd to show container metadata?

leogr commented 3 years ago

AFAIK, this issue should be fixed in libs by https://github.com/falcosecurity/libs/pull/79, but the version of the lib has been not yet upgraded in Falco, so keep it open.

holyspectral commented 3 years ago

It looks like there are multiple issues in this case, so I tried to summarize them as below. Hope this will help other people troubleshoot missing metadata issue in their falco events.

container_id=host. Other than missing metadata, if you also see container_id=host even when you trigger the event from a container, you probably hit this issue. This could be caused by non-default cgroup path. Falco parses a process' cgroup path in order to retrieve its container ID and then use the ID to query container runtime like docker, containerd and kubernetes. There are some discussion and potential fix in this issue about this cgroup path problem. https://github.com/falcosecurity/falco/issues/1568 When you see this, the result of cat /proc/self/cgroup from a container will help people to troubleshoot further.
Missing container.image.* If you see a valid container ID (like container_id=8ae22495737c), but have no metadata about its container image, you may need to check if you have the right socket path setup. You can check and specify the correct path in helm charts overrides or falco command line. For example:
```
containerd:
enabled: true
socket: /xxxx/containerd.sock
```
Providing grpcurl result from falco container will also be very helpful. You can find the step here: https://github.com/falcosecurity/falco/issues/1568#issuecomment-796897481
kubernetes API server connection problem. If you can see a valid container ID but not those with k8s prefix, you may want to check if falco is allowed to connect to your kubernetes API server.
If you can see all metadata and just have problem to trigger a rule with container.privileged condition after containerd/crio is used, this was fixed in https://github.com/falcosecurity/libs/pull/79
If you see this situation only when an event is triggered right after a pod start/restart, it seems to be a known issue. Here is one of the thread for it: https://kubernetes.slack.com/archives/CMWH3EH32/p1629895515219900

Edit: Add case 5 as it's a common case too.

wuestkamp commented 3 years ago

Thanks for the list @holyspectral ! For me it was number 1. After setting SystemdCgroup = true in /etc/containerd/config.toml, like explained in https://github.com/falcosecurity/falco/issues/1568, and after restarting containerd and containers, Falco is getting the correct container_id and metadata

leogr commented 3 years ago

Thank you, for the detailed report @holyspectral Very appreciated :+1:

poiana commented 2 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

leogr commented 2 years ago

cc @FedeDP @jasondellaluce

poiana commented 2 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

poiana commented 2 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

poiana commented 2 years ago

@poiana: Closing this issue.

In response to [this](https://github.com/falcosecurity/falco/issues/1630#issuecomment-1044764449): >Rotten issues close after 30d of inactivity. > >Reopen the issue with `/reopen`. > >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Provide feedback via https://github.com/falcosecurity/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

falcosecurity / falco

Unable to detect events with containerd and kubernetes #1630