[PROPOSAL] Inject Kubernetes Control Plane users into Falco syscalls logs for kubectl exec activities

osagga commented 10 months ago

Disclaimer: We will use the term “k8s user” to refer to the Kubernetes control plane “user” (human or serviceaccount) who makes API calls. This user is different from the traditional Linux users and shall not be confused with the uids Falco logs today (e.g. user.uid, user.name or user.loginuid, see definitions on the Supported Fields for Conditions and Outputs page).

Motivation

When a user runs an interactive kubectl exec session like the following within a Kubernetes cluster:

$ kubectl exec -it nginx2-78848c9dcb-4ptf8 -- /bin/bash
root@nginx2-78848c9dcb-4ptf8:/# sleep 123123
^C
root@nginx2-78848c9dcb-4ptf8:/# exit

The built-in k8s audit controller will log an audit event with the command captured as follows:

{
    "kind": "Event",
    "apiVersion": "audit.k8s.io/v1",
    "level": "RequestResponse",
    "auditID": "7b36f2ea-2e64-4b4a-a946-f387c75b7d72",
    "stage": "ResponseStarted",
    "requestURI": "/api/v1/namespaces/default/pods/nginx2-78848c9dcb-4ptf8/exec?command=%2Fbin%2Fbash&container=nginx&stdin=true&stdout=true&tty=true",
    "verb": "create",
    "user": {
        "username": "kubernetes-admin",
        "uid": "[user-id]",
        "groups": [
            "A",
            "B"
        ]
    },
    <...> # removed other parts not relevent
    "requestReceivedTimestamp": "2023-10-30T18:57:12.826789Z",
    "stageTimestamp": "2023-10-30T18:57:12.899734Z",
    "annotations": {
        "authorization.k8s.io/decision": "allow",
        "authorization.k8s.io/reason": ""
    }
}

If you see from the exec session above, the user ran sleep command on the k8s pod, but only /bin/bash is captured as part of the audit log entry, which creates a gap in k8s auditing solution where any other commands run in the interactive session won't be logged by the Kubernetes audit controller.

Note this auditing gap is called out and is out of scope by design for Kubernetes Auditing controller, k8s maintainers discussion here: https://github.com/kubernetes/kubernetes/pull/29443#discussion_r72612301

This is where tools like Falco come in as they monitor syscalls. However, looking back to the new interactive shell that was spawned from the container entrypoint -- in Falco we won't see the Kubernetes control plane user (which can be a person or a serviceaccount), but the k8s audit log event does contain the user who made the API call (see the user.username or user.uid k8s fields from the Kubernetes Audit Logs). This is mainly because the Linux kernel does not know about anything that happens at layers above the kernel such as the Kubernetes control plane.

Knowing the user who made the k8s API calls is crucial for Incident Response and it would be nice to have the k8s user as part of each Falco log even though the Linux kernel does not know about it. This would enable us to do the following:

Disable specific user credentials as a response to an agent alert
1. e.g. If there’s an agent alert about accessing a secrets file on pod, having correct k8s user will allow admins to quickly disable the credentials to stop an attack
Look up all commands executed by specific k8s user, this will allow for more accurate and complete forensics analysis in case of suspected compromised user credentials
1. e.g. If a specific user or machine is suspected to have installed malicious software, having the ability to query all commands they have executed is very critical in understanding possible damage done.

Feature

One solution that can address the gap above is to provide the k8s user to the security agent running on the respective k8s cluster node. If the agent is able to correctly and securely fetch the k8s user of the user running the exec command at the time of when the new interactive shell is created from the container entrypoint, the agent can keep track of the k8s user and automatically attribute all sub-processes that the user runs in the interactive session with the same k8s user.

[1] The proposed solution involves modifying the k8s cluster, else there is nothing we can do.

We could take advantage of the mutation webhook functionality of k8s that would mutate the PodExecOptions request payload to modify the command parameter to include the k8s user extracted from the userInfo object that the API server includes in mutation AdmissionReview request.

Yes we would tamper with the container entrypoint command that will be run (drawbacks and alternatives to be discussed later).

For example:

Normally the PodExecOptions object looks like the following

        "object": {
            "kind": "PodExecOptions",
            "apiVersion": "v1",
            "stdin": true,
            "stdout": true,
            "tty": true,
            "container": "my-pod",
            "command": [
                "/bin/bash" # this is user controlled through `kubectl`
            ]
        }

After adding the mutation logic the new mutated object from API server will look like the following

        "object": {
            "kind": "PodExecOptions",
            "apiVersion": "v1",
            "stdin": true,
            "stdout": true,
            "tty": true,
            "container": "my-pod",
            "command": [
                "/bin/sh",
                "-c",
                "export SEC_USER_ID=12345;",
                "/bin/bash" # user provided command appended at end of list
            ]
        }

Note mutation above assumes that /bin/sh will always be available on target container, which is not always the case. There might be a way to remove this dependency by using the following pattern instead /usr/bin/env SEC_USER_ID=12345 /bin/bash which would rely on the env binary instead which is typically available on all POSIX systems.

[2] In Falco's event processing loop, we can parse the command line (proc.args) of execve* system calls to find special tokens that provide the Kubernetes user (e.g., SEC_USER_ID). Falco will then cache the k8s user using the regular threadtable caching mechanisms. This will allow all subsequent processes in the kubectl exec session to find the k8s user by traversing the parent process lineage, and as a result, include this information in Falco's output logs.

incertum commented 10 months ago

Thanks a bunch @osagga ❤️ ! Very supportive, having such capabilities would be a tremendous win! Excited to hear what the community thinks!

CC @Andreagit97 @alacuku @LucaGuerra @Issif

incertum commented 10 months ago

To clarify a few details, I am adding some more bullet points:

Re the parts that would modify the Kubernetes cluster via the mutating webhook, this could be a random app not even part of the Falco deployment. It would be completely independent of anything Falco. Additionally, since it would only mutate exec activities, the impact on the Kubernetes cluster should be very small and acceptable. Ultimately, the adopter would need to decide if they are okay with this. It would also be good to communicate that this likely requires a higher level of effort from the adoption point of view.
As an extension to what I mentioned above, Falco will not interact in any way with Kubernetes in this regard (no API calls, no responses needing to be received, no interference from Falco at all). This is because the mutating webhook modifies the command arguments that Falco already logs via the kernel syscall hooks. All Falco now has to do is parse out our injected token and make it available as a new, clean field. Therefore, it is a separate use case from the Falco k8s client functionality, but it is semantically related.
Re the ability to attribute each subsequent command or other syscall for that matter, we do not need a new extra cache. This is because all processes will be child processes of the initial process at the container entrypoint as a result of the kubectl exec, and that process has the Kubernetes user information injected. Therefore, for all subsequent processes, we can simply traverse the parent process lineage until we find the Kubernetes user information, similar to how Falco implements proc.aname and other capabilities today.

Other better ideas are of course very welcome!

incertum commented 10 months ago

@osagga we just had the core maintainers meeting and @gnosek also prefers an approach that manipulates env somehow. I also need to dig into this more after KubeCon. I'll then share an assessment of the Falco libs changes that are needed and that have broader benefits beyond this use case.

sgaist commented 9 months ago

Hi,

If I have followed things correctly, I think that helping on this feature request: https://github.com/kubernetes/kubernetes/issues/113381 would benefit this proposal as it would provide the data that would be needed here without having to alter anything cluster side.

Am I correct ?

incertum commented 9 months ago

@sgaist thank you for chiming in :heart: .

As far as I understand the Kubernetes issue you are referencing seeks to add ObjectMetadata to resources so that for example the pod uid is accessible.

For our use case the control plane user data is already available

"user": {
        "username": "kubernetes-admin",
        "uid": "[user-id]",
        "groups": [
            "A",
            "B"
        ]
    },

By the way @osagga someone else also requested modifications to how we can access proc.env in Falco. Opened a PR. With those changes injecting the user.username and user.uid into env variables (named however you want) as part of the command should be accessible by Falco as we can now search for those envs values in the parent process lineage.

@sgaist this is a great find as addressing this Kubernetes issue will enhance Falco's k8saudit logs plugin. Currently, in the exec Kubernetes audit logs data, although you have the namespace and pod name, the absence of the pod ID or pod UID can cause collisions if you try to correlate them with Falco syscalls logs. Therefore, supporting this issue will benefit Falco and improve security monitoring for everyone beyond Falco.

Regardless of those updates we would still need to use the mutating webhook to modify the exec command so that the control plane user data is "injected into the Linux kernel". This means the data is now available in fields we parse and log in the kernel when a new process gets spawned.

Open to more new ideas :upside_down_face: .

incertum commented 9 months ago

@sgaist we were discussing a bit more and someone brought the distroless container edge cases up where there is no shell and env binary really, e.g. kubectl exec <pod-name> -- ping 8.8.8.8.

With that in mind @osagga was wondering if the Kubernetes issue you referenced could be extended to env in ObjectMeta. I'll let Omar or others elaborate more.