Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

AGIC authentication failed #4549

Open asakth22 opened 1 day ago

asakth22 commented 1 day ago

The AGIC pod within the AKS cluster is giving errors with authentication. The AKS cluster has 'workload identity' enabled and was working fine until we got into this issue.

Logs from the pod:

I0918 05:42:08.294493       1 auth.go:58] Creating authorizer using Default Azure Credentials
I0918 05:42:08.294547       1 client.go:133] Getting Application Gateway configuration.
I0918 05:42:08.294611       1 httpserver.go:57] Starting API Server on :8123
E0918 05:42:08.302858       1 authorizer.go:63] Error getting Azure token: DefaultAzureCredential authentication failed
GET http://169.254.169.254/metadata/identity/oauth2/token
--------------------------------------------------------------------------------
RESPONSE 400 Bad Request
--------------------------------------------------------------------------------
{
  "error": "invalid_request",
  "error_description": "Identity not found"
}
--------------------------------------------------------------------------------
E0918 05:42:08.409498       1 client.go:184] configuration error (bad request) or unauthorized error while performing a GET using the authorizer
E0918 05:42:08.409522       1 client.go:185] stopping GET retries
F0918 05:42:08.409604       1 main.go:175] Failed getting Application Gateway: 
Code="ErrorApplicationGatewayUnexpectedStatusCode" Message="Unexpected status code '401' while performing a 
GET on Application Gateway." InnerError="network.ApplicationGatewaysClient#Get: Failure responding to request: 
StatusCode=401 -- Original Error: autorest/azure: Service returned an error. Status=401 
Code="AuthenticationFailedMissingToken" Message="Authentication failed. The 'Authorization' header is missing the 
access token.""

I found a similar issuer as described here, https://github.com/Azure/application-gateway-kubernetes-ingress/issues/1533 and as a work around running the command below fixes the issue.

az aks update -g MyResourceGroup -n MyManagedCluster --enable-workload-identity

But the workload identity is enabled when the cluster is created.

microsoft-github-policy-service[bot] commented 1 day ago

@sabbour, @JackStromberg would you be able to assist?

JackStromberg commented 1 day ago

See this comment here: https://github.com/Azure/application-gateway-kubernetes-ingress/issues/1533#issuecomment-1798858813

Do you see the oidc endpoint exposed? Do you see the federated credential on the created identity?

Also, we have AGC available now, which simplifies some of this. Would you be willing to share details on using AGIC over AGC? https://learn.microsoft.com/azure/application-gateway/for-containers/overview

Cheers!

asakth22 commented 8 hours ago

Both 'workload_identity_enabled' & 'oidc_issuer_enabled' is enabled on the cluster during the deployment. So everything was working fine before and it's only now that we have run into this issue.

helm ls -n kube-system
NAME                            NAMESPACE   REVISION    UPDATED                                 STATUS      CHART                                                                   APP VERSION
aks-managed-workload-identity   kube-system 35          2024-09-19 07:22:26.78077406 +0000 UTC  deployed    workload-identity-addon-0.1.0-a26bc86f33b244dae3051771b5d79cc32333d28b