Azure / azure-service-operator

Azure Service Operator allows you to create Azure resources using kubectl
https://azure.github.io/azure-service-operator/
MIT License
793 stars 204 forks source link

Support for Namespace scoped RoleAssignments that mitigate privilege escalation risks #3645

Open Roman-Galeev opened 11 months ago

Roman-Galeev commented 11 months ago

Currently it's possible to grant access to arbitrary resource in subscription by setting up owner.armId and Contributor role:

spec:
  owner:
    armId: /subscriptions/00000000-0000-0000-0000-000000000000
  roleDefinitionReference:
    armId: /subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c

I believe RoleAssignment owner should be scoped to namespace only by disallowing armId field in owner, and we also need a ClusterRoleAssignment CRD which can refer to owners in different namespaces and allow armId field.

matthchr commented 10 months ago

You're right that this can be an escalation of privilege, but I don't think that forbidding the armId field prevents it, because ASO supports resource import, which is to say that a user can create a resource that already exists in Azure and ASO will import it, at which point the user could then point a RoleAssignment at it and escalate privilege that way.

ASO sits at the intersection of Kubernetes and Azure and as such developing a secure solution on it I think involves use of both Kubernetes RBAC and Azure RBAC. Before I go any further I think we do a bad job describing how to do this in the ASO documentation. I've filed #3714 which tracks a number of documentation improvements we should make, and this is one of them.

One "solution" to this problem is to avoid installing the ASO RoleAssignment CRD, and ensure that the identities which ASO is using don't have the ability to perform role assignment. Obviously this solves the problem but at the cost of being unable to do role assignments. If you don't have the need or desire to expose RoleAssignments in your cluster then this is a pretty easy and safe option.

Other solutions all basically boil down to:

  1. Ensure that you follow the principle of least privilege when assigning ASO identities.
  2. Ensure that access to resources in your Kubernetes cluster (at least in sensitive namespaces) is tightly controlled.

Ensure that you follow the principle of least privilege when assigning ASO identities

Doing this generally looks something like:

  1. Use credential scope to set up namespaced credentials in your namespaces.
  2. Avoid using the global ASO credential.
  3. Rely on Azure RBAC and give the appropriate permissions (and permission scope) to the namespaced credentials on the Azure side. This means that if users in namespace a are supposed to have RoleAssignment permissions to resources in resourceGroup a, then the identity for namespace a should have contributor/UAA permissions only on resourcegroup a and not the whole subscription.

The above model works best if a given namespace in ASO maps onto a fixed set of somethings in Azure. For example 1 NS -> 1 subscription, or 1NS -> 1 RG. Even 1NS -> N RGs is fine as long as the set of RGs is fixed. That way the permissions can be set up once when the credential is provisioned for the namespace and don't need to be updated again to add new scopes, etc.

Ensure that access to resources in your Kubernetes cluster (at least in sensitive namespaces) is tightly controlled.

This is accomplished on AKS through AAD (now Entra) integration and disabling local users along with optionally defining JIT/Conditional access policies.

A possible setup

A possible setup might be a dev namespace set up as a development environment pointing at a development subscription, and a prod namespace set up as a production env. The dev namespace might point to a development subscription and the prod namespace to a production subscription.

dev namespace has Azure credentials which are contributor/UAA on the dev subscription, same for prod for the prod sub. Developers in dev are members of an Azure AD group with roles that give access to CRUD ASO CRDs and other Kubernetes resources (Pods, etc) in the dev namespace, but not the production namespace.

This means that developers can do basically whatever they want in the dev namespace, including assign roles to themselves at the Azure level in the dev subscription. This isn't really a privilege escalation though because you've already granted them those permissions (through ASO+Kubernetes RBAC). In all liklihood to enable other tools their Azure AD identities already have contributor on the dev sub to use other tools such as Terraform, etc there.

prod namespace also has an Azure AD group with roles that give access to CRUD ASO CRDs and other Kubernetes resources, but that group is by default empty. Users use JIT/Conditional access policies to escalate into that group. This means that by default, nobody can do anything in the prod namespace to either the Kubernetes resources (Pods, etc) or the Azure resources via ASO or the portal.

When users do JIT they then would have the ability to privilege escalate, but only while they're escalated which can be more easily audited. Often users will use something like Argo or Flux here instead of having direct JITs be the flow for running updates, which also means that the proposed changes need to first meet the merge bar (pass through code review, etc) to make it into the repo before Argo/Flux will deploy them to prod. The conditional access/JIT becomes a break-glass used rarely.

Then it's still possible to sneak a privilege escalation through (it's hard to totally forbid this, whether you're using ASO or not), but quite a bit harder, needing some manual approval either of an AAD conditional access JIT or a PR to a repo to modify the yaml and trigger a deployment which likely requires approvers and is easily auditable, while keeping the dev environment looser and easier to use/play with.

Note that the above is just one way to lay things out. The same ideas can be applied to a dev and prod RG within a single sub and also to other more complex topologies. test or int can be added in the middle with a more locked down set of rules than dev but less locked down than prod (or maybe test and prod have very similar lockdowns to force the same procedures across both).

Let me know if this helps at all or if you have more questions. This topic is complex and I am not a security expert so there's likely more that can be done than the above as well.

Roman-Galeev commented 10 months ago

Thanks, @matthchr!

Here is what I would expect as a user of ASO, e.g., a person, who has access to a namespace in Kubernetes cluster equipped with ASO:

  1. I'd like to create Azure resources by committing custom resource definitions to namespace
  2. I'd like to access these resources with a service account within the same namespace

Currently, granting access to resources created with ASO is possible with a few CRDs, namely: UserManagedIdentity, FederatedIdentity, and RoleAssignment, and then annotating ServiceAccount somehow. However, RoleAssignment allows granting access to any resource to which ASO has access itself, which breaks these assumptions, because the only safe way would be someone else with elevated privileges doing that (e.g., these skipping installing RoleAssingment controller, and/or disabling write access to namespace in favor of gitops with peer review).

Regarding auto-importing resources, I belive it would be extermely helpful having an option of disabling it globally, so let's assume we have that. Then we just need to check that RoleAssignment CRD in the namespace grants access to ASO resources deployed in that namespace, which is exactly what would be expected.

And, regarding the security model, I believe we should have Cluster-wide resources and namespaced resources, e.g., ClusterRoleAssignment should allow specifying ArmID, and for that matter, reference to any ASO CRD deployed, while namespaced RoleAssignment should not.

Another nice to have security feature would be an annotation to namespace to restrict deployments to given Azure ResourceGroup[s].

matthchr commented 8 months ago

Regarding auto-importing resources, I belive it would be extermely helpful having an option of disabling it globally, so let's assume we have that.

To be clear, we don't really "auto import" arbitrary resources. Instead, users who have the ability to create say, a StorageAccount, can import an existing resource into ASO by creating a StorageAccount with the same name in the same subscription. If they want ASO to manage that storage account (== update it in Azure) they just apply it and ASO will adopt it and start managing it. If they want to just import it they can specify the reconcile-policy annotation.

This can obviously be prevented today by:

Another nice to have security feature would be an annotation to namespace to restrict deployments to given Azure ResourceGroup[s].

I think a better way to accomplish this is to use an Azure Identity (UMI or ServicePrincipal, but prefer UMI) that only has permissions to create resources in a particular resource group. The advantage of doing it that way over having it be ASO enforcement is that even if somebody gets access to that credential and goes directly to azure CLI or portal they still are locked to the specified RG, rather than now having access to the whole sub if they somehow break the credential out of the k8s context.

This also solves (at least some) of the ClusterRoleAssignment versus namespaced RoleAssignment, as users that only have permission to that k8s namespace are using the namespaces Azure credential, which is contributor on their RG but no others which means even if they have RoleAssignment permissions, they can't actually grant themselves permissions on any other RG, because the Azure Identity ASO is running as in that namespace doesn't have the ability to do so either.

Let me focus on the ask you've got though: am I correct that what you're asking for w/ the namespace scoped RoleAssignment that doesn't support armId is to be able to grant cluster users the ability to assign permissions to things they create only? Bonus if it's things they create only in a particular RG?

Roman-Galeev commented 8 months ago

am I correct that what you're asking for w/ the namespace scoped RoleAssignment that doesn't support armId is to be able to grant cluster users the ability to assign permissions to things they create only? Bonus if it's things they create only in a particular RG?

Yeah, pretty much. Namespace-scoped RoleAssignment should assign roles only to the ASO CRDs in the same namespace, and should not have armId escape hatch, which should go to ClusterRoleAssignment CRD. And, to your point, ideally we should be able to restrict all ASO CRDs in the namespace to given Azure resource group(s), e.g., by labeling/annotating the namespace.

To be fair, I've implemented these restrictions with cluster-wide Kyverno mutating policies, but this feels rather unnatural.

Roman-Galeev commented 8 months ago

This can obviously be prevented today by

I'd say that it is not obvious at all that the moment a cluster admin enables RoleAssignment CRD it opens a way of taking ownership of any Azure resource ASO have access to, which is recommended to set having admin access to subscription by documentation. Personally, I was really surprised, and to me it's a critical security issue.

matthchr commented 8 months ago

I'd say that it is not obvious at all that the moment a cluster admin enables RoleAssignment CRD it opens a way of taking ownership of any Azure resource ASO have access to, which is recommended to set having admin access to subscription by documentation. Personally, I was really surprised, and to me it's a critical security issue.

Yes sorry, agreed it is not obvious at the moment because it's not documented - so clearly not obvious. What I was trying to say is: "there is a way to accomplish this today".

I'll be updating the documentation for this release to give significantly clearer, prescriptive guidance. Will also leave this issue open and discuss with colleagues about this ask:

am I correct that what you're asking for w/ the namespace scoped RoleAssignment that doesn't support armId is to be able to grant cluster users the ability to assign permissions to things they create only? Bonus if it's things they create only in a particular RG?

theunrepentantgeek commented 7 months ago

We've documented the currently supported approaches to mitigate this escalation risk.

matthchr commented 7 months ago

I've retitled this issue to try to more accurately reflect the user ask. Hopefully the documentation we have now at least makes folks aware of this risk and we can investigate doing a namespace scoped role assignment in a future release. Along w/ namespace scope we may need to prevent or limit resource imports as well if we were to go that route.

For now, as the document @theunrepentantgeek linked suggests, using Azure policies + JIT (and/or Kyverno like you're doing) are mitigations.

Roman-Galeev commented 7 months ago

Thanks! Meanwhile, are there any plans to make it scoped to namespaces?

matthchr commented 6 months ago

Thanks! Meanwhile, are there any plans to make it scoped to namespaces?

We're considering the idea and what additional requirements (blocking imports?) would have to come along with it to make it actually secure.

There's no concrete plan yet but we're open to the idea and thinking about what it might look like.

In terms of actual timeline, I don't really have one but wouldn't expect it in 2.8 or 2.9 as we've got other things on those lists already and now that we've actually documented some security best practices to mitigate the risk of privilege escalation with Role Assignment (and just to control access in general) we're in a better spot than we were when you raised this issue at least from an awareness perspective.

matthchr commented 6 months ago

And, to your point, ideally we should be able to restrict all ASO CRDs in the namespace to given Azure resource group(s), e.g., by labeling/annotating the namespace.

I was re-reading this and realized I hadn't really explicitly replied to this.

We're open to doing the NamespacedRoleAssignment, but I don't think we're going to do this. The way to do this is to make a namespace scoped identity, and limit that identity to a particular RG or set of RGs w/ RoleAssignment.

That same approach works if you want to limit to a particular subscription, or to particular types of resources (even when ASO has installed more CRDs at the cluster level) and so is significantly more flexible than trying to do limits via namespace annotations or similar.

Does that solution not work for your use-case, and if not can you expand on why? Does it have to do with the difficulty of dynamically producing identities and assigning them to dynamically created namespaces? Or is the issue primarily needing to do RoleAssignments via ASO which requires high privilege that's problematic, and if you had NamespacedRoleAssignment you could then create these per-namespace identities and give them permissions more easily?

matthchr commented 3 weeks ago

We think that this is a reasonable ask, but haven't had the time to design + implement it yet.