Azure / azure-service-operator

Azure Service Operator allows you to create Azure resources using kubectl
https://azure.github.io/azure-service-operator/
MIT License
733 stars 193 forks source link

Feature: Allow injection of Workload Identity to all containers of a pod #3764

Open karljaxon opened 7 months ago

karljaxon commented 7 months ago

We have deployments which use ASO to create identities, and also generate pods which use these identities as workload identities. We can use inject the ID into the main container of the pod using the FAQ article mentioned below. However, we also need to connect to Azure resources (service bus etc.) from sidecars. In this case we are using dapr, and we do not have much control of dapr sidecar injection.

Adding the workload id annotation to a service account works well provided we know the id up-front. It does not work so well with ASO-generated configmaps. I'm currently using Flux to do the deployment in two stages (ASO creating the configmap triggers a redeployment) - but this is clunky. We would like a way to create identities using ASO alongside k8s deployments and ensure all containers have the same ID.

My initial thought was to have ASO add service accounts as an additional operatorSpec output - similar to secrets and cms. But on k8s slack @matthchr suggested adding a new ASO-specific annotation which mimics the behaviour when the 'azure.workload.identity/use' annotation is in place. I'm sure there are plenty of ways in this could work!

FAQ article: https://azure.github.io/azure-service-operator/guide/frequently-asked-questions/#when-using-workload-identity-how-can-i-easily-inject-the-aso-created-user-managed-identity-details-onto-the-service-account

matthchr commented 7 months ago

We'll definitely talk about doing this. Another possibility is that we convince the workload identity developers that this is a pattern they should support generally, as there are other operators such as Crossplane, Radius, Terraform, etc that can produce identities dynamically at deployment time.

Ideally WI would work with all of those. I don't know that all of those put the identity details into a configmap... but we could look around and see what they do.

matthchr commented 7 months ago

@aramase not sure what your thoughts on this are.

karljaxon commented 7 months ago

I know that the tf operator has a general pattern of publishing outputs to either configmaps or secrets, but can't comment on the rest. My concern with doing this through any form of webhook (whether the webhook is part of WI or ASO) is with timing. When I install a chart I am supplying info for the infra and pods at the same time. The configmap will appear later once some infra has been created, but the pods will immediately try to appear and fire the hook. AFAIK hooks can't retry - we can maybe set a timeout, but apart from that they either fail or are ignored. Is it going to work ok with the idea of reading info from a configmap/secret that will be produced at a later point in time?

I'm assuming if we created a service account as an output it'd be easier. I could take a crack at a PR for making a service account, but I suspect you might be able to think of a better approach that also brings along other infra-creating operators... Let me know thoughts

matthchr commented 7 months ago

The configmap will appear later once some infra has been created, but the pods will immediately try to appear and fire the hook. AFAIK hooks can't retry - we can maybe set a timeout, but apart from that they either fail or are ignored. Is it going to work ok with the idea of reading info from a configmap/secret that will be produced at a later point in time?

My thinking was that the webhook would look for an annotation like this on the SA:

azure.workload.identity/clientid-from-configmap: <configmapname>
azure.workload.identity/clientid-from-configmap-key: <keyname>

and then inject something like this:

          - name: AZURE_CLIENT_ID
            valueFrom:
              configMapKeyRef:
                key: <keyname>
                name: <configmapname>

So the fact the configmap doesn't exist yet would be dealt with via the pod infrastructure, not by the webhook actually reading the contents of the configmap.

matthchr commented 7 months ago

and then as you say it works for any operator that supports exporting stuff as a configmap, which as you say many do (Terraform and ASO at least, I'd need to check Crossplane but I'd bet it does too)

karljaxon commented 7 months ago

Got it - yes. The addition of AZURE_CLIENT_ID was what we originally did manually, and it worked fine - but only for one container. If the webhook can inject similar entries for all containers in the pod this would work a treat

matthchr commented 6 months ago

We should consider having a meeting with the workload identity team to advocate for this - ideally with some evidence that other operators also could use this (Radius, etc?)

karljaxon commented 6 months ago

This sounds good - and yes, feel free to add me to a call. Please bear in mind I'm in the UK, on GMT :) I can do a bit of background research on other operators, but I wouldn't really have time to try them out... The best I can do is see if I can see how they propagate infra dependencies from one resource to another

matthchr commented 6 months ago

I can do a bit of background research on other operators, but I wouldn't really have time to try them out... The best I can do is see if I can see how they propagate infra dependencies from one resource to another

We also can do this investigation - but if you're aware of any off the top of your head let us know. My current list to investigate is:

matthchr commented 3 weeks ago

I've been meaning to advocate for this to the workload identity team but haven't gotten around to it. Leaving this open to track that.

karljaxon commented 3 weeks ago

Thanks @matthchr - this would be good to get in. We're still deploying using 'two bites of the cherry' - so our charts first deploy infra and then upgrade to install the pods once the identity exists.

I think this would also help with issues we've encountered where the identity we want to use is updated (replacing identities, or changes to the chart meaning ASO generates a new identity). Since the webhook only fires on the CREATE operation you can end up with a mismatch between the service account's annotation and the value of AZURE_CLIENT_ID.

It sounds like this is the best that the workload identity webhook can do with its current design (I just searched and saw this stack overflow: https://stackoverflow.com/questions/77323629/admission-webhook-pod-update) - env vars can't be mutated on UPDATE operations.

If your suggested design is implemented then AZURE_CLIENT_ID would be populated from a cm reference and so will dynamically shift if the configmap is updated.

Actually, thinking about it some more I still wouldn't be able to update the identity without somehow triggering a redeploy. Just reading this: https://kubernetes.io/docs/tutorials/configuration/updating-configuration-via-a-configmap/#:~:text=This%20is%20because%20environment%20variables,run%20with%20the%20updated%20information. and I also realised that changing the env var without replacing the token would not be very healthy!