akuity / kargo

Application lifecycle orchestration
https://kargo.akuity.io/
Apache License 2.0
1.39k stars 114 forks source link

proposal: option for api server not to touch credentials (secrets) at all #2214

Open krancour opened 3 days ago

krancour commented 3 days ago

We're already very careful about this...

Rather than give the API server cluster-scoped r/w on Secrets, we have given that permission to the management controller instead. As Projects are created/deleted, the management controller dynamically expands/contracts the permissions of the API server by creating/deleting RoleBindings that add/remove r/w access to Secrets in the Project namespace only.

This is inherently more secure because unlike the api server, the management controller receives no inbound traffic and therefore has less attackable surface.

But for users who elect to GitOps their credentials (along with everything else), the API server could stand to entirely lose its ability to manage credentials, meaning the API server could, in such a case, not require permission to of any kind on any Secrets at all. 🥳

For mature organizations, this trades functionality that they wouldn't be using anyway for improved security footing.

On the back-end, this is super easy.

@rbreeze ideally, I'd like to figure out a way to hide credential management tabs entirely if an admin opts into disabling that functionality. I was thinking we can add a field to the response from the protected configuration endpoint (not the public one) that could inform the front-end of whether the API server can or cannot manage credentials. wdyt?

https://github.com/akuity/kargo/blob/93940f957ec5eda563f8b163a3fc3d78474e3120/api/service/v1alpha1/service.proto#L21

https://github.com/akuity/kargo/blob/93940f957ec5eda563f8b163a3fc3d78474e3120/api/service/v1alpha1/service.proto#L136-L138

Edit: All of the above could equally be said for the API server's ability to manage SAs/Roles/RoleBindings in Project namespaces.

Brightside56 commented 3 days ago

@krancour we can continue our discussion here I think. From perspective of infrastructure engineer is see threat not in API server which can have an access to VCS token which can "touch" all production branches, I see treat in kargo-controller which has an access to VCS token in case of sharding

I wouldn't like kargo-controller of shard "staging" or "region-xx" to be able to "touch" all production branches (what would provide a possibility for intruder to deploy anything to anywhere from any kargo-controller manager cluster)

https://github.com/akuity/kargo/blob/main/charts/kargo/templates/controller/cluster-roles.yaml#L33 - Also controller has RBAC-permissions to manipulate freights/promotions/stages/warehouses

Because of that, every Kubernetes cluster (with deployed controller) is a part of root of trust and in case of compromise allows to perform infrastructure/organization takeover

I still can use central kargo-controller (aka default shard controller), but then I will be not able to "push" changes to spoke ArgoCD clusters from this default shard and "pull" ArgoCD app statuses to ensure correctness. This is sad, but it looks like to be an acceptable solution from a security point of view

It could be better probably, if:

krancour commented 2 days ago

Also controller has RBAC-permissions to manipulate freights/promotions/stages/warehouses

There may be a few more permissions than are required here. I am quite happy to review them.

As to the hub-and-spoke architecture that you have proposed as an alternative to what we do now, we considered and rejected that early on because of the massive vulnerabilities it introduces.

Because of that, every Kubernetes cluster (with deployed controller) is a part of root of trust and in case of compromise allows to perform infrastructure/organization takeover

The hub-and-spoke architecture offers a much more direct path to that degree of compromise. Compromise of a Kargo control plane that could mutate Argo CD Apps in remote clusters (whether that is via Kubernetes API or Argo CD API is irrelevant) would compromise all the clusters to which it was connected. Taking Kargo out of the picture entirely, this is the exact same reason that hub-and-spoke topology is discouraged even for Argo CD itself.

By contrast, the biggest danger in the architecture we currently use is the concern you raised:

I wouldn't like kargo-controller of shard "staging" or "region-xx" to be able to "touch" all production branches (what would provide a possibility for intruder to deploy anything to anywhere from any kargo-controller manager cluster)

This is a perfectly valid concern, but can be mitigated with branch protections.

Additionally, you have the option to use workload identity for managing controller access to repositories, which is a mature approach that further mitigates any concerns you have over controllers consuming credentials that reside in the control plane.

Brightside56 commented 2 days ago

This is a perfectly valid concern, but can be mitigated with branch protections.

But how? As I know, kargo-controller can only consume single set of credentials from Kargo control plane cluster. Or, is kargo-controller able to consume it from cluster where it's installed?

Additionally, you have the option to use workload identity for managing controller access to repositories, which is a mature approach that further mitigates any concerns you have over controllers consuming credentials that reside in the control plane.

From what I see it's working only with container registries like ACR/GAR/ECR, but not with Github/Gitlab

There may be a few more permissions than are required here. I am quite happy to review them.

Aha, I will try to cut those permissions to what controller needs and prepare a PR then

krancour commented 2 days ago

But how? As I know, kargo-controller can only consume single set of credentials from Kargo control plane cluster. Or, is kargo-controller able to consume it from cluster where it's installed?

This is true. I hadn't actually been thinking of branch protections that allow only specific principals to push to a branch. I'd been thinking more of protections like those requiring PRs. So you can shut down direct pushes to prod entirely and promotion into prod would depend on a PR that requires approval.

Consuming credentials directly from the controller's own cluster is an interesting idea, but I see two main impediments to it:

  1. On the implementation side, Project namespaces only exist in the control plane's cluster, so a controller consuming creds directly from its own cluster would have to follow some kind of alternative tenancy rules. This seems a rather large complication.

  2. On the UX side, you lose the ability to manage the creds all in one place. Today, you can manage all of a Project's creds via CLI, UI, or GitOps, all of which are managing creds that live in the control plane. If controllers were to gain the option to obtain creds from their own cluster, this would necessitate managing creds separately for each. This seems like another unwelcome complication.

But you've put an idea in my head. As a middle ground, we could easily (optionally) shard credentials so that they still live in the control plane and are centrally managed, but it would create the opportunity to have different clusters use different creds for the same repos.

Note this is imperfect, however. It would constrain what credentials are used by a normally functioning controller, but it wouldn't block a malicious actor who has obtained a controllers kubeconfig for the control plane from obtaining any credential they wish. Even if imperfect, it's still a useful option, so I will propose sharded creds in a separate issue.

From what I see it's working only with container registries like ACR/GAR/ECR, but not with Github/Gitlab

🤦‍♂️ My bad. This is what happens to my brain late on a Friday. Disregard.

Brightside56 commented 1 day ago

This is true. I hadn't actually been thinking of branch protections that allow only specific principals to push to a branch. I'd been thinking more of protections like those requiring PRs. So you can shut down direct pushes to prod entirely and promotion into prod would depend on a PR that requires approval.

Thank you, this is very helpful advice in my situation, but if we talk about scenario without such approval (from point of user experience I would prefer, probably in future, to have an approval inside Kargo), kargo-controllers can consume VCS token from control plane and this token has the permissions to push to all production branches.

Some guy with an access to staging environment - also has a possibility to access VCS token from control plane and push anything to any production environment.

IMO it's less secure than hub-spoke. Yes, hub-spoke implies certain risks, but those risks are predictable, isolated inside domain and perimeter areas and manageable, what usually means: "if your production instance of xxx yyy is not compromised - you're fine"

But you've put an idea in my head. As a middle ground, we could easily (optionally) shard credentials so that they still live in the control plane and are centrally managed, but it would create the opportunity to have different clusters use different creds for the same repos.

Note this is imperfect, however. It would constrain what credentials are used by a normally functioning controller, but it wouldn't block a malicious actor who has obtained a controllers kubeconfig for the control plane from obtaining any credential they wish. Even if imperfect, it's still a useful option, so I will propose sharded creds in a separate issue.

It may be quite hard/tiresome, but possible to isolate an access to such sharded secrets using RBAC rules with granular get/CEL or using policy controllers.

Consuming credentials directly from the controller's own cluster is an interesting idea, but I see two main impediments to it

  1. On the implementation side, Project namespaces only exist in the control plane's cluster, so a controller consuming creds directly from its own cluster would have to follow some kind of alternative tenancy rules. This seems a rather large complication.
  2. On the UX side, you lose the ability to manage the creds all in one place. Today, you can manage all of a Project's creds via CLI, UI, or GitOps, all of which are managing creds that live in the control plane. If controllers were to gain the option to obtain creds from their own cluster, this would necessitate managing creds separately for each. This seems like another unwelcome complication.

Those creds may be managed as kargo-controller creds (alternative tenancy rules)

Those creds may be managed as ArgoCD creds (borrowing, again, yes))), just need to provide them permission to push certain branches which may correspond this specific ArgoCD tenant. You may say it's a terrible decision, because it goes against the principle of least privilege (ArgoCD doesn't need those permissions, kargo-controller too), but borrowing may be acceptable, because if intruder has unauthorized access to ArgoCD VCS secrets, he/she most probably has high privileges in the ArgoCD cluster and access to the target cluster API server(s), then access to these secrets isn't the biggest problem. These secrets most probably will not increase blast radius

Such installation of Kargo (with secrets in controller's own cluster) seems to be very strong from high level zero-trust perspective, because:

  1. Possibility to compromise control plane gives nothing to intruder, because there is no access to VCS, intruder may just mutate targetRevision of ArgoCD apps, but not source repository
  2. Possibility to compromise shard cluster with kargo-controller also should not provide for intruder any possibilities for lateral movement (assuming that kargo-controller isn't able to access/mutate Kargo resource it shouldn't)
krancour commented 3 minutes ago

We are not switching to a hub-and-spoke architecture.