emissary-ingress / emissary

open source Kubernetes-native API gateway for microservices built on the Envoy Proxy
https://www.getambassador.io
Apache License 2.0
4.32k stars 684 forks source link

A potential risk in emissary that could lead to takeover of the cluster #5654

Closed HouqiyuA closed 1 week ago

HouqiyuA commented 2 months ago

Dear Team Members:

Greetings! Our team is very interested in your project and we recently identified a potential RBAC security risk while doing a security assessment of your project. Therefore, we would like to report it to you and provide you with the relevant details so that you can fix and improve it accordingly. I can't find the private email that reported the security issue. That's why I raised the issue here. If there is anything inappropriate about it, I hope you can forgive me.

Details:

In this Kubernetes project, there exists a ClusterRole that has been granted list secrets high-risk permissions. These permissions allow the role to list confidential information across the cluster. An attacker could impersonate the ServiceAccount bound to this ClusterRole and use its high-risk permissions to list secrets information across the cluster. By combining the permissions of other roles, an attacker can elevate the privileges and further take over the entire cluster.

we constructed the following attack vectors.

First, you need to get a token for the ServiceAccount that has this high-risk privilege. If you are already in a Pod and have this override, you can directly run the following command to get the token: cat /var/run/secrets/kubernetes.io/serviceaccount/ token. If you are on a node other than a Pod, you can run the following command to get the kubectl describe secret .

Use the obtained token information to authenticate with the API server. By including the token in the request, you can be recognized as a legitimate user with a ServiceAccount and gain all privileges associated with the ServiceAccount. As a result, this ServiceAccount identity can be used to list all secrets in the cluster.

We give two ways to further utilize ServiceAccount Token with other privileges to take over the cluster:

Method 1: Elevation of Privilege by Utilizing ServiceAccount Token Bound to ClusterAdmin

Directly use a Token with the ClusterAdmin role permissions that has the authority to control the entire cluster. By authenticating with this token, you can gain full control of the cluster.

Method 2: Create Privileged Containers with ServiceAccount Token with create pods permission You can use this ServiceAccount Token to create a privileged container that mounts the root directory and schedules it to the master node in a taint-tolerant way, so that you can access and leak the master node's kubeconfig configuration file. In this way you can take over the entire cluster.

For the above attack chain we have developed exploit code and uploaded it to github: https://github.com/HouqiyuA/k8s-rbac-poc

Mitigation methods are explored:

Carefully evaluate the permissions required for each user or service account to ensure that it is following the principle of least privilege and to avoid over-authorization.

If list secrets is a required permission, consider using more granular RBAC rules. Role Binding can be used to grant list secrets permissions instead of ClusterRole, which restricts permissions to specific namespaces or resources rather than the entire cluster.

Isolate different applications into different namespaces and use namespace-level RBAC rules to restrict access. This reduces the risk of privilege leakage across namespaces

Looking forward to hearing from you and discussing this risk in more detail with us, thank you very much for your time and attention.

Best wishes.

HouqiyuA

AliceProxy commented 1 week ago

Thank you for your interest and report, but this seems incorrect.

Yes, if the pod itself becomes compromised you could take advantage of the pod's ServiceAccount to list secrets or do anything else granted to the pods. The same is true of other API Gateway projects with the same permissions. We do not give Emissary RBAC to create anything other than secrets/configmaps/ and Ambassador custom resources. It is not explained how your proposed attack which requires creating new pods would be possible without the RBAC necessary to do so.

Ideally, we plan to move to distroless as a base image in the future instead of alpine to remove shell access within the pods to mitigate concerns such as having the list secrets permission. It's always good to tighten security anywhere we can and to restrict permissions to only what is necessary, but the list secrets permission is unfortunately necessary for the API Gateway to function properly and support user config. We cannot reasonably ask users to apply new Roles/Rolebindings for every new namespace that they want to deploy Emissary configuration into so that they can reference secrets for use with TLS/Certificate Revocation Lists/etc. It is a common RBAC role used by other API Gateways. If individual users are uncomfortable with it, they are free to modify the release RBAC (which many people do) in order to further restrict access to only the ways in which they are using the application such as limiting it to a certain namespace. Creating additional installation options in the helm chart to restrict the RBAC and application to a single namespace might also be a nice addition for those that are more security-sensitive about the list secrets permission.

If you can explain how an attack like this is possible without the RBAC necessary to create new pods, then please re-open issue.

kbr-trackunit commented 1 week ago

@AliceProxy I think you should have a look at this post from Traefik: https://github.com/traefik/traefik/issues/7097 There are some descriptions on why this is a security issue and far from best practice to use the list secrets. They are able to only do it using a namespace filter and in that way limit impact.

In the ending of the post there are also comment suggestions ways to fix this and harden the security a lot. (Traefik developers also state it will not be on their roadmap in the near future - But just because they don't prioritize security doesn't mean that you don't have to).

AliceProxy commented 1 week ago

@kbr-trackunit

I think you should have a look at this post from Traefik: https://github.com/traefik/traefik/issues/7097 There are some descriptions on why this is a security issue and far from best practice to use the list secrets. They are able to only do it using a namespace filter and in that way limit impact.

I definitely agree that the list secrets RBAC is far from ideal, but as mentioned in that conversation, there is currently no way via RBAC to restrict them based on something like labels that would be dynamic and flexible enough to let users create Emissary Mappings that reference secrets freely.

It seems like in that discussion there is a mix between users talking about the RBAC and users talking about what the traefik code actually watches (regardless of the permissions it is given). The emissary watch system packages (k8s/accumulator) are extremely janky and outdated. For the most part, they just try to watch everything of a given type and then figure out what is needed later (which is it's own entire can of worms issue sensitive information aside). They need to be replaced with modern controller practices, but even with modern tools and a complete rewrite of the Emissary controller system, I don't see how we can get away from still needing at minimum the get secrets RBAC which is a little better than list as it limits discoverability, but with most people using simple and human readable names, if the pods were truly compromised, it would be trivial for an attacker to simply automate get requests and check if a secret exists or not using a combination of brute force and dictionaries.

There are some things as I mentioned in my last commend and from the issue that you linked that we could do here to improve the security posture.

Install/Upgrade time configuration:

RBAC doesn't seem to offer many options between the low risk and maximum risk when dealing with secret permissions. I would have no issues with either (or both) of the above as opt-in features for those who like the approach (or any other suggestions that don't require simply breaking Emissary's configuration system). We could mitigate some confusion by having helm also configure passing that same context to the emissary pods themselves and log a message notifying users bout the currently allowed secret names/namespaces whenever we can't resolve one that is referenced. Some changes to the watch system shy of a total rewrite might also be necessary to properly support the above, but I'm not entirely sure at the moment. Currently, it doesn't support status updates in a reliable or performant way, so we would need to transition to something like controller runtime to support properly updating a status subresource on emissary configuration objects to tell users what went wrong.

I closed this issue since I don't believe that takeover of the cluster via the outlined approach is possible and that was the focus of this issue, but if you would like to hone in specifically on the list secrets RBAC and continue this discussion, I suggest we create a new issue with that as the focus and copy/reference the above conversation so far rather than continuing the discussion here.