Azure / karpenter-provider-azure

AKS Karpenter Provider
Apache License 2.0
308 stars 46 forks source link

Nodes created by Karpenter are unable to pull images from a private Azure Container Registry (ACR), resulting in a 401 Unauthorized error #411

Open ATymus opened 2 weeks ago

ATymus commented 2 weeks ago

Version

Karpenter Version: v0.5.0

Kubernetes Version: v1.29.4

Expected Behavior

The expected behavior is that the nodes can access the private ACR using the configured managed identity.

Actual Behavior

Nodes created by Karpenter and regular Kubernetes nodes both have the same managed identity configured. This managed identity has been granted both AcrPull and AcrPush roles on the ACR. However, while pods on regular Kubernetes nodes can successfully pull images from the private ACR, pods on nodes created by Karpenter fail with the following error: 401 Unauthorized Screenshot 2024-06-19 at 12 57 59

Steps to Reproduce the Problem

az aks update -n aks-dev -g rg-dev --attach-acr myregistry

Resource Specs and Logs

Events: Type Reason Age From Message


Warning FailedScheduling 39m default-scheduler 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. Normal Scheduled 37m default-scheduler Successfully assigned default/test-779d54dfd-djk7d to aks-general-purpose-zfxqd Normal Nominated 39m karpenter Pod should schedule on: nodeclaim/general-purpose-zfxqd Warning FailedCreatePodSandBox 37m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1d4cfc55733b95293627a57ffb7a20de269debdf1b2afc2116aeb103042afeb4": plugin type="cilium-cni" failed (add): failed to invoke delegated plugin ADD for IPAM: http request failed: Post "http://localhost:10090/network/requestipconfigs": dial tcp 127.0.0.1:10090: connect: connection refused; failed to request IP address from CNS Normal SandboxChanged 36m (x5 over 37m) kubelet Pod sandbox changed, it will be killed and re-created. Normal Pulling 36m (x3 over 36m) kubelet Pulling image "myregistry.azurecr.io/test-image:latest" Warning Failed 36m (x3 over 36m) kubelet Failed to pull image "myregistry.azurecr.io/test-image:latest": failed to pull and unpack image "myregistry.azurecr.io/test-image:latest": failed to resolve reference "myregistry.azurecr.io/test-image:latest": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://myregistry.azurecr.io/oauth2/token?scope=repository%3Atest-image%3Apull&service=myregistry.azurecr.io: 401 Unauthorized Warning Failed 36m (x3 over 36m) kubelet Error: ErrImagePull Warning Failed 35m (x5 over 36m) kubelet Error: ImagePullBackOff Normal BackOff 2m29s (x152 over 36m) kubelet Back-off pulling image "myregistry.azurecr.io/test-image:latest"

Community Note

danielhamelberg commented 9 hours ago

@ATymus I recommend enabling the debug log level in Karpenter, redeploying and sharing more Resource Specs and Logs: kubectl describe pod <pod-name> -n <namespace> kubectl describe node <node-name> az aks show --resource-group <resource-group> --name <aks-cluster> --query "identity" az role assignment list --assignee <managed-identity-id> --scope <acr-id> Also double-check the secret the pod is using to access the ACR.