cloudfoundry / cf-crd-explorations

Apache License 2.0
3 stars 2 forks source link

Explore: efficiently listing a resource when creds are restricted to a subset of namespaces #71

Closed kieron-dev closed 3 years ago

kieron-dev commented 3 years ago

Problem

The cf-shim executes k8s API requests to manipulate CF CRs using user credentials supplied by the CF CLI. If the user only has list permissions on a subset of namespaces, a shim API request to list all resources that a user has access becomes tricky to implement efficiently.

The cf-shim has no idea of the user permissions, so the first attempt may be a cluster-wide listing of the resource. If this succeeds, that is fine. However, the user is unlikely to have cluster-wide list permission on the resource.

The k8s API then only allows listing resources by namespace. A naive approach would be to iterate through the cluster namespaces (which is problematic in itself as our user probably will not have permission to list namespaces), listing the resources in each namespace. We would expect permission errors in any namespace the user has no privileges to list the resource in, but we could aggregate the successes, and this would provide the correct result.

However, such an approach is inefficient. It is O(N) complexity with the number of namespaces, and involves a k8s API call for each namespace. In a large cluster, it could be a long operation, and it puts unnecessary strain on the API server. Several users with namespaced permissions listing resources at the same time could overload the API server.

What is an efficient approach to avoid listing resources in namespaces where the user has no permissions to the resource? Bear in the mind the desired authorisation approach where this is delegated completely to k8s and the shim knows nothing about users, roles or role bindings.

Areas to consider

Can we determine the username?

There is no k8s command to give you the current user or group membership. We will definitely have a credential passed to the shim. This might be a JWT, in which case the username will be in one of the JWT properties. However, which property it is might be determined by the k8s configuration, and further the k8s configuration might namespace it with a prefix, which the shim doesn't know. Or it might be a client certificate, in which case the username is the CN of the certificate. It might be basic auth, in which case we will have the username directly. Or finally we might have an opaque token which can only be translated into a username only by k8s making the appropriate API call on the underlying identity provider, which the shim has no information about.

So getting the username from the token is problematic. Maybe we can obtain it from the user info sent to an admission controller? Would having a custom resource dedicated to username divination be feasible. Its admission webhook could always reject requests, but the rejection message could contain the user details. This would add at least two additional HTTP calls to the process. Is this acceptable?

What other approaches will give us a username?

Can we list namespaces a user has permissions in using k8s RBAC configuration?

Once we have a username, can we find out which namespaces we should restrict our listing to? Will this violate the principle that the shim knows nothing about roles and role bindings?

Things to consider:

Can we utilise the orgs / spaces model (if it exists)?

There is talk of representing orgs and spaces as CRs. Part of that design might involve listing usernames in the org and space CRs, maybe with role details. That would be used by controllers to automate the management of the role bindings for the users.

Can we get the details we need from that structure? Or how must orgs and spaces be structured to support this?

What sort of caching can we employ to further reduce overhead for these operations?

Can we cache token <-> username? Tokens could last anywhere from a few minutes to a few hours, so we don't necessarily have to look up usernames repeatedly.

Can we cache user namespaces? When should we invalidate the cache?

Are apps worth caching?

Any other solutions

Maybe the k8s API can do this efficiently somehow?

gcapizzi commented 3 years ago

Is the option of just not supporting listing across orgs/spaces still on the table?

kieron-dev commented 3 years ago

I suppose that is the fall back if we can't find a workable solution otherwise.

I worry about supporting org level roles like org auditor. Presumably their reason is to list stuff across namespaces.

gcapizzi commented 3 years ago

Other question: what happens if you try to do something like this:

client.CoreV1().Pods("").List(v1.ListOptions{})

But you don't have permission on all namespaces? I guess it's just going to fail?

kieron-dev commented 3 years ago

Yep - that will fail without cluster perms.

kieron-dev commented 3 years ago

There is another requirement for doing this efficiently now. See https://github.com/cloudfoundry/cf-k8s-api/pull/20.

The shim needs to build up a GUID -> Namespace/Name cache, preferably using the client's credentials. It's not going to be efficient to fill the cache if we need to trawl through 1000 permission denied calls before hitting the correct namespace.

Then again, maybe the cache could be populated with a service account that does have global permissions.

kieron-dev commented 3 years ago

Can we identify the user?

The worst case is that the shim receives an opaque token. This will be the case for GKE with default gcloud authentication. So let's try that.

kieron-dev commented 3 years ago

Built-in functionality

There is an issue here requesting a whoami API endpoint. This comment suggests RBAC debug or audit logs might be the way forward.

RBAC Debug

I assume this means kubectl auth can-i ... with debug level turned up.

In the past, kubectl auth can-i get pods/foo -v10 would give a reason for allowing or disallowing the request including the user name. But that is no longer the case. So unless there is another form of RBAC debugging, we must drop that approach.

Audit Logs

Audit logs are set up as part of the cluster configuration. They might log to a file, or call an audit webhook. Our standard problem with managed clusters is that we can't control this. We also can't guarantee the audit level - whether we get audit events for user access or not - or how we can access those logs. So I think this is another lost cause.

kieron-dev commented 3 years ago

Kubectl whoami plugin

There is a kubectl plugin in existence that attempts to solve this problem: kubectl-whoami. It seems to omit grabbing the CN from a certificate, instead returning kubecfg:certauth:admin. However, it does manage successfully to decode an opaque gcloud token, and service account tokens.

It works by creating a k8s auth TokenReview resource containing the token. k8s will call its token review webhook with this token, and return the user information if successful in the status. If the user does not have permission to create the TokenReview, it will strip the username from the error message.

This means we can get a username from passed credentials as follows:

Type Method Comment
Basic auth base64 decode header and split
Client cert Get CN from cert
Any token Create a TokenReview resource and read the status Involves an API call (or two for opaque tokens)

I've also just verified that the token review method works with OIDC authentication. The set-up had preferred-username as the custom username field, with oidc: as the custom prefix. The whoami plugin picked up the username correctly both when TokenReview creation was denied, i.e. extracting the username from the error message, and when I gave the user the cluster-admin role, so that it came from the TokenReview status.

So reproducing this technique in the shim code looks like an efficient approach of getting a username from credentials and our current k8s client.

kieron-dev commented 3 years ago

Regarding getting usernames, we believe that relying on extracting them from error messages is fragile, therefore a sensible approach would be to:

kieron-dev commented 3 years ago

Regarding the guid -> namespace cache mentioned above required for getting an app by guid, for example:

kieron-dev commented 3 years ago

In the orgs/spaces explore, we have tended towards the principle that even global admin users should not have cluster wide permissions on any resource. This protects resources in non-CF namespaces. For example, we might need to give name-restricted access to secrets. But then an admin would have this permission across all namespaces and could potentially engineer a secret name clash granting permission to a secret in a non-CF namespace.

The consequence of this is that a global admin user has namespaced RoleBindings in all the CF namespaces in the cluster. This might be thousands. We are trying to determine how to make GET /apps efficient when called by a user with permissions on a subset of CF namespaces by restricting which namespaces we list in. However, for an admin user, there will be no restriction. GET /apps will have to return results from all CF namespaces.

For a user with permissions in a small number of namespaces, performing a list call in each namespace using the user's credentials would be a reasonably efficient way of retrieving the app list. But for the admin user, this is the pathological case where the number of requests grows linearly with the number of spaces in the foundation.

It certainly feels like abandoning list requests without a space constraint is the simplest answer. We really need a decision on that! But let's assume this is not possible, then how do we make that admin GET /apps efficient?

danail-branekov commented 3 years ago

I think that a clusterrolebinding to allow listing apps for global roles could work well. Apps are CF custom resources, therefore limiting the cluster-wide list permissions down to CF apps resources only for global roles should not be a security concern. However, the problem remains for spaced roles that are only allowed to list apps in certain namespaces.

kieron-dev commented 3 years ago

That does solve the admin user problem. And we can keep to namespaced role bindings for non-CF resources, like secrets and pods, where there is no API requirement to list across namespaces.

kieron-dev commented 3 years ago

An efficient query to list apps in a subset of namespaces might hope to use an in filter, like what is possible with a label selector:

kubectl get pods -l 'environment in (production, qa)'

Although metadata.namespace is always available as a field selector, field selectors do not support the set operators like in, only =, == and !=. Even using an indexed cache from controller-runtime would not allow that.

But we see by using labels, this is possible. So, although it seems redundant, we might consider including a label with the space (or namespace) name on every CF resource. That would enable a client with cluster list permissions to efficient list resources in namespaces where a user has list permissions.

The question is how to reliably determine those list permissions, and how far we go breaking the k8s RBAC encapsulation.

kieron-dev commented 3 years ago

Blocking on https://github.com/cloudfoundry/cf-crd-explorations/issues/75

kieron-dev commented 3 years ago

Following comments in the proposal about caching, we created an example using a controller-runtime cache.

We created 100 namespaces with 10 apps in each namespace. This takes ages to create, by the way.

Then we iterated over each namespace, listing the apps in each namespace.

The first access took 100ms, presumably when the cache is initially populated, and then subsequent accesses took just 10s of microseconds each.

That was when using a user with cluster-wide permissions on viewing and listing the resource.

As soon as we switch to a user without cluster-wide permissions on the resource, building the cache failed.

All we could do was to create two controller-runtime 'cluster's and clients. One with cluster-wide permissions, and the other without. We engineered the second client to share the cache from the first. Unfortunately, this breaks RBAC. The user without permissions on the namespace can see them from the cache anyway. So RBAC is implemented when populating, rather than accessing, the cache.

So it looks like a cache populated by a privileged user is not a solution. And a cache without cluster permissions just doesn't work.

See https://github.com/eirini-playground/auth-explore/commit/912770d01b5a0cb972e3819f5ebb9ef44314bdd8 for the final two client example.