Closed VioletHynes closed 1 year ago
There could be something I'm missing, but this is a fresh workload-identity-webhook install on a fairly fresh (created last week) AKS cluster, so I kind of would expect this to 'just work' since this is meant to be the new way to do things. If there is something I'm missing, do let me know!
If you're enabling the addon --enable-workload-identity
, you don't have to install the webhook again from this repo. The add-on is a managed version of this project and when you run --enable-workload-identity
, AKS will deploy the webhook in kube-system
namespace.
Ah. The WIF troubleshooting documentation suggested I debug in the azure-workload-identity-system
namespace, which wasn't populated at all: https://azure.github.io/azure-workload-identity/docs/troubleshooting.html - the other documentation (e.g.
https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) doesn't really mention troubleshooting steps or that it installs these resources or where.
I'm not entirely convinced that the one that was installed by default is working right now either, but now that I know where it is, I can at the very least look at the logs a bit and understand why.
Would you suggest that I uninstall the helm chart, and should the one installed by --enable-workload-identity
be good enough?
Ah. The WIF troubleshooting documentation suggested I debug in the azure-workload-identity-system namespace, which wasn't populated at all: https://azure.github.io/azure-workload-identity/docs/troubleshooting.html - the other documentation (e.g. https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) doesn't really mention troubleshooting steps or that it installs these resources or where.
The troubleshooting docs in this repo are specific to the helm chart installation. Thanks for point out the missing section in the AKS docs. There is room for improvement here.
Would you suggest that I uninstall the helm chart, and should the one installed by --enable-workload-identity be good enough?
Yes, you can uninstall the helm chart. Only a single instance of the webhook is required.
It has been quite confusing to troubleshoot WIF and those were the only troubleshooting docs I could find (e.g. they're the top google result for "workload identity federation troubleshooting azure") - it might be helpful to also note in the troubleshooting docs where to find the troubleshooting docs for WIF that doesn't use the helm chart, or some information that's not specific to the helm chart information. Nothing on the docs that I can see would indicate it doesn't apply for AKS.
The only other docs I've seen (as part of the error message I get when the request to http://169.254.169.254/metadata/identity/oauth2/token
fails) is this one, which doesn't mention WIF at all: https://aka.ms/azsdk/go/identity/troubleshoot#managed-id
Thanks for the help, though! I appreciate the help and information greatly. Feel free to close this. I'm still surprised the resource errored in the way it did on a fresh install, but ultimately I'm not blocked by it any more.
It has been quite confusing to troubleshoot WIF and those were the only troubleshooting docs I could find (e.g. they're the top google result for "workload identity federation troubleshooting azure") - it might be helpful to also note in the troubleshooting docs where to find the troubleshooting docs for WIF that doesn't use the helm chart, or some information that's not specific to the helm chart information.
There should be a separate troubleshooting guide in the AKS docs. @miwithro @karataliu could you'll track this?
The only other docs I've seen (as part of the error message I get when the request to http://169.254.169.254/metadata/identity/oauth2/token fails) is this one, which doesn't mention WIF at all: https://aka.ms/azsdk/go/identity/troubleshoot#managed-id
This means the workload is using an old version of sdk which still relies on IMDS to get a managed identity token. Here are the minimum required SDK versions for workload identity: https://azure.github.io/azure-workload-identity/docs/topics/language-specific-examples/azure-identity-sdk.html.
I'm still surprised the resource errored in the way it did on a fresh install
The service account permission error could be because of multiple instances of the webhook. Just enabling the add-on with --enable-workload-identity
shouldn't contain any errors.
To clarify, there are two ways you can enable workload identity on AKS:
kube-system
namespaceazure-workload-identity-system
namespace.Using both together will result in a conflict.
The cause for issue here is there is a non-namespace resource clusterrolebinding After you install AKS version, it points to serviceaccount in kube-system namespace. When you then install opensource version it temply changed it to azure-workload-identity-system namespace. But AKS integration will keep refreshing it back to the kube-system namespace. Thus the pods in azure-workload-identity-system namespace will report errors.
The suggestion here is to choose only one of the solutions (AKS integration or open source).
Describe the bug
Hi there! I'm trying to set up a WIF enabled cluster. Here are all of the steps I've done so far:
az feature show --namespace "Microsoft.ContainerService" --name "EnableWorkloadIdentityPreview"
andaz provider register --namespace Microsoft.ContainerService
(once it's registered)az aks update -g wif-test_group -n wif-test --enable-workload-identity
az aks update -g wif-test_group -n wif-test --enable-oidc-issuer
az identity federated-credential create
to itThe above completed without issue (though the documentation was fairly scattered). The only step I think I'm missing is installing the webhook.
When I install the webhook through either of the approaches indicated here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html - the two pods in the
azure-workload-identity-system
namespace are erroring with that error, and do not seem to be injecting anything into my annotated pods.This has been a clean install each time, and I've made sure to clean up each time.
I found a similar issue here: #777 but reinstallation doesn't fix it for me. I've tried many times to reinstall and always get this issue.
There could be something I'm missing, but this is a fresh workload-identity-webhook install on a fairly fresh (created last week) AKS cluster, so I kind of would expect this to 'just work' since this is meant to be the new way to do things. If there is something I'm missing, do let me know!
Steps To Reproduce
Install WIF admissions webhook using either of the approaches outlined here: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html
I'm using the latest Helm chart (I've done
helm repo update
) on an AKS cluster I made last week.Expected behavior
I shouldn't get errors when installing WIF into an AKS environment.
Logs
Environment
AKS
kubectl version
): Client Version: v1.26.3, Kustomize Version: v4.5.7, Server Version: v1.24.10cat /etc/os-release
): Macuname -a
): DarwinAdditional context
I'm looking to get my environment working so I can test a change to Vault Agent to support WIF authentication for Vault Agent.