falcosecurity / falco

Cloud Native Runtime Security
Apache License 2.0
7.08k stars 876 forks source link

Falco 0.38.0 - k8s specific fields are not populated any more #3243

Open networkhell opened 3 weeks ago

networkhell commented 3 weeks ago

After upgrading to falco 0.38.0 some k8s specific fields are not pupulated any more. E.g. k8s.ns.name amd k8s.pod.name.

Enviroment ist k8s 1.28.6 with the following runtime components:

Deploy falco 0.38.0 via Manifest with default config. Trigger any alert that contains k8s specific output fields e.g. spawn a shell in a container.

When a rule is triggered I execpt the relevant fields to be pupulated from the container runtime. But k8s.* fields are missing after the upgrade to 0.38.0

14:37:02.679088469: Notice A shell was spawned in a container with an attached terminal (evt_type=execve user=root user_uid=1000 user_loginuid=-1 process=bash proc_exepath=/usr/bin/bash parent=runc command=bash terminal=34816 exe_flags=0 container_id=ce69f7e51afe container_image=harbor.***/hub.docker.com-proxy/library/python container_image_tag=3.12-slim container_name=k8s_***-service-python_***-oauth-service-5995bb9788-fllrf_management_b8968793-8b38-42fd-b2cf-1681edb9f99e_0 k8s_ns=<NA> k8s_pod_name=<NA>)


incertum commented 2 weeks ago

Hi @networkhell happy to help debugging.

First of all wanted to provide feedback that for me it continues to work very well after upgrading to Falco 0.38.0, so let's try to get to the bottom of what it could be.

Could you share some statistics re how often the k8s fields are missing? For all events even though you have the container fields?

Here are the relevant source code parts:

Maybe even try all k8s fields https://falco.org/docs/reference/rules/supported-fields/#field-class-k8s

networkhell commented 2 weeks ago

Thanks for getting back to me @incertum!

As soon as I am running falco 0.38.0 I never get these k8s.* fields populated. Always N/A... while the container fields are always populated. There is one thing I noticed while testing with crictl: I have to use unix:///var/run/cri-dockerd.sock as cri socket. Containerd socket does not work.

Crictl Output (issued on k8s node)

crictl -r unix:///var/run/cri-dockerd.sock inspect da940514d5f22
  "status": {
    "id": "da940514d5f2240780141367da424cbbc48d10bf35563bb0c2097b4b0fc9ddfd",
    "metadata": {
      "attempt": 0,
      "name": "falcoctl-artifact-follow"
    "state": "CONTAINER_RUNNING",
    "createdAt": "2024-06-12T10:06:52.862952477+02:00",
    "startedAt": "2024-06-12T10:06:52.997678056+02:00",
    "finishedAt": "0001-01-01T00:00:00Z",
    "exitCode": 0,
    "image": {
      "annotations": {},
      "image": "harbor.***/hub.docker.com-proxy/falcosecurity/falcoctl:0.8.0",
      "userSpecifiedImage": ""
    "imageRef": "docker-pullable://harbor.***/hub.docker.com-proxy/falcosecurity/falcoctl@sha256:6ec71dea6a5962a27b9ca4746809574bb9d7de50643c7434d7c81058aaecde3a",
    "reason": "",
    "message": "",
    "labels": {
      "io.kubernetes.container.name": "falcoctl-artifact-follow",
      "io.kubernetes.pod.name": "falco-wc8xk",
      "io.kubernetes.pod.namespace": "falco",
      "io.kubernetes.pod.uid": "18ce76bd-29ee-4421-83c3-ff56e7de7cf8"
    "annotations": {
      "io.kubernetes.container.hash": "52596eac",
      "io.kubernetes.container.restartCount": "0",
      "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
      "io.kubernetes.container.terminationMessagePolicy": "File",
      "io.kubernetes.pod.terminationGracePeriod": "30"
    "mounts": [
        "containerPath": "/plugins",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/volumes/kubernetes.io~empty-dir/plugins-install-dir",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false,
        "uidMappings": []
        "containerPath": "/rulesfiles",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/volumes/kubernetes.io~empty-dir/rulesfiles-install-dir",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false,
        "uidMappings": []
        "containerPath": "/etc/falcoctl",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/volumes/kubernetes.io~configmap/falcoctl-config-volume",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": true,
        "selinuxRelabel": false,
        "uidMappings": []
        "containerPath": "/var/run/secrets/kubernetes.io/serviceaccount",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/volumes/kubernetes.io~projected/kube-api-access-6fsfh",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": true,
        "selinuxRelabel": false,
        "uidMappings": []
        "containerPath": "/etc/hosts",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/etc-hosts",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false,
        "uidMappings": []
        "containerPath": "/dev/termination-log",
        "gidMappings": [],
        "hostPath": "/var/lib/kubelet/pods/18ce76bd-29ee-4421-83c3-ff56e7de7cf8/containers/falcoctl-artifact-follow/0f6b0a2c",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false,
        "uidMappings": []
    "logPath": "/var/log/pods/falco_falco-wc8xk_18ce76bd-29ee-4421-83c3-ff56e7de7cf8/falcoctl-artifact-follow/0.log",
    "resources": null
  "info": {
    "sandboxID": "c3c8b3f1706bd7a038f786406ad45ec1bd92b8f391ccb61ca4a3dcafe39189a3",
    "pid": 3573572

crictl output with containerd socket

crictl -r unix:///run/containerd/containerd.sock ps
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD

So I tried the following without success: run falco pods with the following args and cri-dockerd.socket mounted on path /host/var/run/cri-dockerd.sock

            - /usr/bin/falco
            - --cri
            - /var/run/cri-dockerd.sock
            - -pk 
            - name: HOST_ROOT
              value: /host

As soon as I roll back to 0.37.1 and default args:

            - /usr/bin/falco
            - --cri
            - /run/containerd/containerd.sock
            - --cri
            - /run/crio/crio.sock
            - -pk

the fields are populated again.

So is there any way I can debug this in detail within the falco container? Or maybe some work is do be done to officially support cri-dockerd as cri interface?

incertum commented 2 weeks ago

Thanks for providing the crictl output. The labels are there.

There is one thing I noticed while testing with crictl: I have to use unix:///var/run/cri-dockerd.sock as cri socket. Containerd socket does not work.

The regression you mention seems puzzling. Plus you also have 2 runtimes running right? We definitely touched the container engines during the last releases. Maybe the regression is something very subtle wrt to the container engine type and/or the fact you run these 2? Maybe it enters the docker container engine logic now and not the CRI logic even though you pass the --cri socket. IDK yet.

Btw we never tested it with /var/run/cri-dockerd.sock and only do tests and unit tests with containerd and crio, the 2 runtimes we primarily support for Kubernetes. docker runtime support is internally in Falco a different container runtime and does not contain any k8s logic.

Not sure we can fix this for the immediate next Falco 0.38.1 patch release, because we are always very careful when touching the container engine as it can break easily CC @falcosecurity/falco-maintainers .

Edit: In addition, the new k8smeta plugin is an alternative way to get k8s enrichment, just FYI.

So is there any way I can debug this in detail within the falco container?

Likely need to compile the source and sprinkle more debug statements here and there, but you can try running with the lib logger in debug mode for sure.

Your /etc/crictl.yaml shows what? [When I toggle runtimes for local tests I always edit that file]

runtime-endpoint: unix:///run/containerd/containerd.sock image-endpoint: unix:///run/containerd/containerd.sock

incertum commented 2 weeks ago

Installed cri-dockerd real quick and used crictl to spin up a pod, and it's confirmed it does categorize it as docker ...

$ sudo crictl run container-config.json pod-config.json                                                                       [22:16:09]

$ sudo crictl pods                                                                                                            [22:16:15]
POD ID              CREATED             STATE               NAME                NAMESPACE           ATTEMPT             RUNTIME
d49a88ccc29a9       32 seconds ago      Ready               nginx-sandbox       default             1                   (default)

It also can't be decoupled from the docker service.

06-12 22:16:16.362950 docker (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): No sandbox label found, not copying liveness/readiness probes
06-12 22:16:16.362968 docker (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): No liveness/readiness probes found
06-12 22:16:16.363033 docker_async (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): parse returning true
06-12 22:16:16.363147 docker_async (d0e07ff4adba), secondary (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): Secondary fetch successful
06-12 22:16:16.363204 docker_async (d0e07ff4adba): parse returning true
06-12 22:16:16.363306 docker_async (d0e07ff4adba): Source callback result=1
06-12 22:16:16.363585 notify_new_container (d0e07ff4adba): created CONTAINER_JSON event, queuing to inspector

Above's command are the libs logger debug lines, so you should be able to get similar logs when enabling the libs logger.

Maybe the fact that it worked before was a lucky accident and now since we cleaned up the code a bit, it doesn't work anymore. Let me check now what would need to be done to support this scenario.

incertum commented 2 weeks ago

@networkhell opened a WIP PR. The issue was that the cgroups layout for docker was not supported for our internal CRI container engine. However right now we would do lookups against the docker and cri-dockerd sockets ...

We need a design discussion among the maintainers to see how we can best support cri-dockerd. because above can cause a few issues.

CC @gnosek @therealbobo

6-12 22:54:53.338290 docker (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): No sandbox label found, not copying liveness/readiness probes
06-12 22:54:53.338298 docker (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): No liveness/readiness probes found
06-12 22:54:53.338327 docker_async (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): parse returning true
06-12 22:54:53.338376 docker_async (d0e07ff4adba), secondary (d49a88ccc29a9ba723aa81caa3cc6aa6b9232a09b456e2faf3ca5fd597ab4c46): Secondary fetch successful
06-12 22:54:53.338403 docker_async (d0e07ff4adba): parse returning true
06-12 22:54:53.338449 docker_async (d0e07ff4adba): Source callback result=1
06-12 22:54:53.338507 Mesos container [d0e07ff4adba],thread [1514227], has likely malformed mesos task id [], ignoring
06-12 22:54:53.338514 cri (d0e07ff4adba): Performing lookup
06-12 22:54:53.338525 cri_async (d0e07ff4adba): Starting synchronous lookup
06-12 22:54:53.340012 cri (d0e07ff4adba): ContainerStatusResponse status error message: ()
06-12 22:54:53.340051 cri (d0e07ff4adba): parse_cri_image: image_ref=docker-pullable://busybox@sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7, digest_start=26
06-12 22:54:53.340068 cri (d0e07ff4adba): parse_cri_image: have_digest=1 image_name=docker-pullable://busybox
06-12 22:54:53.340080 cri (d0e07ff4adba): parse_cri_image: tag=, pulling tag from busybox:latest
06-12 22:54:53.340106 cri (d0e07ff4adba): parse_cri_image: repo=docker-pullable://busybox tag=latest image=docker-pullable://busybox:latest digest=sha256:9ae97d36d26566ff84e8893c64a6dc4fe8ca6d1144bf5b87b2b85a32def253c7
06-12 22:54:53.340128 (cgroup-limits) mem cgroup for container [d0e07ff4adba]: /proc/1/root/sys/fs/cgroup//system.slice/docker-d0e07ff4adbabb8cec6f75d022c711c9cb7487d085c5c277760bd8172d8366d6.scope
networkhell commented 2 weeks ago

@incertum thank you for  your efforts regarding this issue 🙂

Let me provide some information that may help you with your design discussion:

Kubernetes deprecated docker as container runtime as of version 1.20 and removed the dockershim in version 1.24 in favor for cri compliant runtimes. 

Cri-dockerd is maintained outside of Kubernetes by docker / mirantis as a cri interface for the docker runtime. So it is still possible to use Kubernetes with docker as runtime. 

I guess most cloud providers already dropped docker as runtime but like me there are a lot of users that run k8s on prem and need to stick with docker, at least for a while, for various reasons.

So I would fully understand if you decide to not add support for this “deprecated” runtime when used alongside Kubernetes. But on the other hand maybe a lot of users rely on the combination of docker and Kubernetes. 

You also mentioned before that it seems that I am running two runtimes (cri-dockerd and containerd). So the reason for this is that docker uses containerd for some operations so it is a docker dependency. But with its default configuration the cri interface of containerd is disabled so it can’t be queried by crio and neither being used as cri interface for kubelet.

I hope this helps a little. Please let me know if you need any further information.

incertum commented 2 weeks ago

Thanks for the additional info. I believe we should support cri docker, because we also support docker and who knows maybe it becomes more relevant in the future.

Just need to check and find a way to make sure that we do not look up the same container from 2 sockets, that's all. Not like it involves lots of code changes or so.

@leogr thoughts? However it definitely wouldn't be part of the next patch release.

leogr commented 2 weeks ago

Thanks for the additional info. I believe we should support cri docker, because we also support docker and who knows maybe it becomes more relevant in the future.

I have no idea if this may become relevant. We should dig into it.

@leogr thoughts? However it definitely wouldn't be part of the next patch release.

Totally agree. Let's see in https://github.com/falcosecurity/libs/pull/1907 and target it for libs 0.18 (Falco 0.39)

incertum commented 1 week ago

@leogr I believe exposing container engines configs in falco.yaml can not only help here, but also make the configuration more versatile. For example, I never liked that all container engines are enabled and the end user has no control at all. Plus for some deployment scenarios it will be better to not needing CLI flags (e.g. cri, disable-cri-async) and instead have the option to configure everything over falco.yaml similar to other settings.

Basically, if we have explicit enabled tags for each container engine we can easily support cri-dockerd while ensuring we do not look up the same container from 2 sockets.

We have a few options:

  1. Follow the plugins configs approach
enable_container_engines: ["docker", "cri", ...]
 - name: docker
 - name: cri
   cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
   disable-cri-async: false
  1. Similar to 1., but an explicit enabled tag per engine.
 - name: docker
   enabled: true
 - name: cri
   enabled: true
   cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
   disable-cri-async: false
  1. ... ?

Defaults will of course remain the same.

leogr commented 1 week ago

@leogr I believe exposing container engines configs in falco.yaml can not only help here, but also make the configuration more versatile.

Totally agree :+1:

Basically, if we have explicit enabled tags for each container engine we can easily support cri-dockerd while ensuring we do not look up the same container from 2 sockets.

We have a few options:

  1. Follow the plugins configs approach
enable_container_engines: ["docker", "cri", ...]
 - name: docker
 - name: cri
   cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
   disable-cri-async: false
  1. Similar to 1., but an explicit enabled tag per engine.
 - name: docker
   enabled: true
 - name: cri
   enabled: true
   cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
   disable-cri-async: false

I prefer 2 over 1. Anyway, we will still have the issue that's not easy to use lists with -o from the command line (cc @LucaGuerra).

The option 3 might be:

      enabled: true
      enabled: true
      cri: ["/run/containerd/containerd.sock", "/run/crio/crio.sock", "/run/k3s/containerd/containerd.sock"]
      disable-cri-async: false

That being said, I believe it's time to open a dedicated issue for this :)

incertum commented 1 week ago

I may like option 3, it seems shorter. yes let me open a dedicated issue.

incertum commented 1 week ago

/milestone 0.39.0

incertum commented 3 days ago


Once (1) https://github.com/falcosecurity/libs/pull/1907 and (2) https://github.com/falcosecurity/falco/pull/3266 are merged you could test the master falco container with the folllowing config. Important would be to disable docker.

    enabled: false
    enabled: true
    cri: ["/run/cri-dockerd.sock"]
    disable-cri-async: false
    enabled: false
    enabled: false
    enabled: false
    enabled: false
    enabled: false