giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Potential Pinniped usecases and architecture overview #3453

Closed gawertm closed 3 months ago

gawertm commented 4 months ago

after first backstage usecase with pinniped, we like the product and want to evaluate if there are other use cases where pinniped will help us solve customer use cases. For those use cases we should start designing a very high level architecture to see how it would fit into our product

anvddriesch commented 4 months ago

I'm preparing a writeup for next week. Random notes: Screenshot from 2024-05-23 13-43-51

anvddriesch commented 4 months ago

Pinniped usecases and architecture overview

Pinniped is a project that provides a way to authenticate to a Kubernetes cluster using OIDC. It is a way to delegate authentication to an external identity provider. The main reason to use pinniped over (or in conjunction with) our current dex setup is that a single pinniped supervisor can be used to authenticate to multiple clusters and no secrets or other seperate configuration is needed on the cluster side.

use cases

1. default oidc access to all giantswarm workload clusters

Currently customers are free to setup their own OIDC provider and configure their workload clusters to use it via our auth bundle including dex or something else. However, it's not the default and most giant swarm clusters do not have OIDC enabled since this is a lot of work on their part. This means that access to most of our workload clusters is done via client certificates which are not as secure as OIDC. Client certificate creation also requires access to the management cluster which is not always possible. OIDC allows for more fine grained access control via groups as well as instant revocation of access. Setting up a pinniped supervisor per customer or per MC would allow us to provide OIDC access to all workload clusters by default just by installing the pinniped concierge on them which can easily be done via default apps. No additional configuration is needed on the cluster side.

2. multi-cluster access for backstage

We have successfully setup backstage to use pinniped for authentication to more than one cluster. When authenticating through pinniped, the user will be able to see resources from all connected clusters they have access to. This is a great way to provide a single pane of glass for all clusters while ensuring fine grained access control. Right now in order to achieve this, users need to separately log into every cluster they want to see resources from. Furthermore, this has to be configured manually since there is no way to automatically configure backstage to use OIDC for every cluster nor is it guaranteed that every cluster has OIDC enabled. From team honeybadger:

The main use case that we have in mind for now is "tracking resources related to component's deployments that exist on WCs". Currently we can only display information from an App CR (or HelmRelease). But we would like to be able to display the entire tree of resources that are coming from the App CR (or HelmRelease). For example: Deplyment, ReplicaSet, Pods, Secrets, ServiceAccounts, RoleBindings etc. We could also go deeper into resource details, like spec, metadata, events. Having access to WCs would enable us to implement this in our custom plugins. There are also community plugins that can display information about component related resources, the default Kubernetes plugin, as an example. It can elevate the visibility of errors where identified, and provide drill down about the deployments, pods, and other objects for a service. At current state it's not able to display much based on the MC access. Access to WC's resources would enable it to do more. Another use case can be to provide more information about WCs, for example how a WC is structured in terms of Namespaces etc.

3. shareable kubeconfigs for customer teams without infra access

In order to access a cluster, a user needs to have a kubeconfig file with the correct credentials. Currently whether these are client certificates or OIDC tokens, the kubeconfig file contains secrets that are specific to the cluster and the user and ideally should not be shared. Pinniped allows to create kubeconfig files that do not contain any secrets. Instead it uses an exec plugin so that the user will be prompted to authenticate via OIDC when they run kubectl commands using the kubeconfig. This means that an admin can create and distribute kubeconfigs for other team members without worrying about security.

4. CI/CD access for tests and automation (on GS side)

Since our current setup requires a browser window to open and a real user to log in, we don't run automated tests for authentication on the giantswarm side. Pinniped actually has some options for CI/CD usage so we may have better test coverage.

pros of using pinniped over current setup

cons of using pinniped over current setup

architecture

Pinniped consists of two main components: the concierge and the supervisor. From the official documentation:

The Pinniped Supervisor is an OIDC server which allows users to authenticate with external identity providers (IDP), and then issues its own federation ID tokens to be passed on to clusters based on the user information from the IDP.

The Pinniped Concierge is a credential exchange API which takes as input a credential from an identity source (e.g., Pinniped Supervisor, proprietary IDP), authenticates the user via that credential, and returns another credential which is understood by the host Kubernetes cluster or by an impersonation proxy which acts on behalf of the user.

pinniped_architecture_concierge_supervisor

Another critical component in the architecture are federation domains. These are custom resources used to define the external identity providers that the supervisor can authenticate against as well as the oidc issuer that the concierge will use to authenticate against the supervisor. Federation domains need to be in the same namespace the supervisor is running in. Concierges can then be configured with the address of the issuer to become part of the federation domain.

scenarios

Since the objective is to give oidc access to workload clusters, we are going to want to install the concierge on all workload clusters. This can be done via the default apps.

However, the supervisor(s) can theoretically be run anywhere. Furthermore, more than one federation domain can be created for each supervisor. We also need to make a conscious decision on whether management clusters should have a concierge or not. (If not, they will not be accessible via pinniped)

1. external supervisor for each customer

setup1

In this scenario, each customer has their own supervisor running outside of the management cluster. Management clusters and workload clusters have concierges installed and a federation domain is created for each MC. (More are possible but not shown here) This is the recommended setup and secure since the supervisor is not running on the same cluster as the concierge. Also, MC access and WC access are the same and can be managed in the same way through the concierge. However, in this scenario, we need to decide in which cluster to install the supervisor, how to manage it and whether customers should have access to it. Furthermore, whether supervisors for different customers would be in the same cluster or not.

2. supervisor and concierge on mc

setup2

In this scenario, the supervisor is running on the management cluster and federation domains are created for each organization. This is less secure since the supervisor is running on the same cluster as the concierge. On the other hand, in our current setup the management cluster is already the central point of authentication so it might make sense to run the supervisor there. This would also make it easier to manage the supervisor since it can be managed in the same way as all the other management cluster components. (configuration via mc bootstrap and customer config repos, monitoring via prometheus, etc.)

3. supervisor on mc, concierge on wc

setup3

In this scenario, the supervisor is running on the management cluster and the concierge is running on the workload clusters. This one is more secure than the previous one since the supervisor is not running on the same cluster as the concierge. However, it is easier to manage than the first scenario since the supervisor is running on the management cluster. The problem with this setup is that we can not access the management cluster via pinniped. This makes access more complicated and inconsistent.

4. central supervisor

setup4

In this scenario, there is a single supervisor running in a central cluster simiarily to the current teleport setup and each customer has their own federation domain. On one hand this is quite secure since the supervisor is separate from the concierges and also no customers have access to it. On the other hand, this is less secure since the supervisor is a single point of failure and a single point of attack. Also it would mean that different customers credentials are stored in the same place.

gawertm commented 4 months ago

@gawertm please have a look

gawertm commented 3 months ago

sorry for the delay in reviewing. I really like the overview!

what are the security implications if supervisor and concierge are running on the same cluster, e.g. MC?

also as I understand we can define multiple federation domains, meaning multiple Idps like our github, our azure as well as the customer ones? So that we can login as well as the customers, but with different idps each? Why would we still keep Dex in the game then?

anvddriesch commented 3 months ago

what are the security implications if supervisor and concierge are running on the same cluster, e.g. MC?

We should look at this more in detail, but all the issuer and identity provider configuration is in the supervisor namespace so people with mc access via pinniped would be able to access all of that. It's not really a big deal compared to the fact that people with mc access can already just create client certificates to all workload clusters. We could have a separate issuer/federation domain for the MC to make the seperation bigger.

we can define multiple federation domains, meaning multiple Idps like our github, our azure as well as the customer ones?

We can do that. In that case we could have an authenticator for each of the federation domains issuers on the workload cluster. (I haven't tested that but it should work) Otherwise we can also add several idp connectors to the same federation domain (as we do for dex) and use the same issuer.

Why would we still keep Dex in the game then?

Potentially it could ease migration pains because we can continue to use exacly the same configuration and simply add dex as a connector to pinniped without breaking anything but enabling customers to log into other clusters using their MC dex. Further down the line we could get rid of it if we want.

Another reason would be that dex can be used to connect to non-oidc auth methods behind the scenes which some of our customer might want to use. Only oidc providers can directly be added to pinniped. So in case let's say someone want's to use ldap, they could add dex as oidc provider to pinniped and then add their ldap connector to dex.

gawertm commented 3 months ago

ok understood, great! I will try to create some follow-up ticket and then close this one here