Open ShortMVitesse opened 7 months ago
What is AIMS account for? Looks like you are using managed identity, could you share the yaml you are using?
im sure you know this better than me, but just in case. AIMS = Azure Instance Metadata Service and its the miniature website hosted on the 169 address in the error. http://169.254.169.254/metadata/identity/oauth2/token
apiVersion: azconfig.io/v1
kind: AzureAppConfigurationProvider
metadata:
name: appconfig-portal-insights
namespace: vnext
spec:
endpoint: https://blah-blah-appconf.azconfig.io
target:
configMapName: configmap-portal-insights
configMapData:
type: json
key: mysettings.json
auth:
managedIdentityClientId: f0ad9467-1234-4fa0-9235-3798532c828b
configuration:
selectors:
- keyFilter: '*'
labelFilter: service-portal-insights
I'm not able to reproduce the issue on my side, have you enabled pod-managed identity or wokload identity on your cluster?
we're not ready to enable workload identity yet, so we have to get this working with the managed identity for now.
Do you mind re-installing it with more detailed log verbosity to see if it could help with root cause the issue?
helm install azureappconfiguration.kubernetesprovider \
oci://mcr.microsoft.com/azure-app-configuration/helmchart/kubernetes-provider \
--namespace azappconfig-system \
--create-namespace \
--set logVerbosity=3
would love to. can you help me find the reference for a helm chart to deploy it, as thats how we do it. not sure where to pass that log level in a helm chart.
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: azure-app-configuration
namespace: azappconfig-system
spec:
releaseName: azure-app-configuration
chart:
spec:
chart: kubernetes-provider
sourceRef:
name: azure-app-configuration
kind: HelmRepository
namespace: flux-system
interval: 10m
install:
crds: Create
upgrade:
crds: CreateReplace
logverbosity: 3
Could you try:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: azure-app-configuration
namespace: azappconfig-system
spec:
releaseName: azure-app-configuration
chart:
spec:
chart: kubernetes-provider
sourceRef:
name: azure-app-configuration
kind: HelmRepository
namespace: flux-system
interval: 10m
install:
crds: Create
upgrade:
crds: CreateReplace
values:
logverbosity: 3
i have done, but i dont see any upgraded logging. still just this coming from the pod
E0314 16:30:36.149902 1 appconfigurationprovider_controller.go:264] Fail to create the target ConfigMap or Secret of AzureAppConfigurationProvider 'appconfig-portal-insig │
│ hts' in 'vnext' namespace: ManagedIdentityCredential: ManagedIdentityCredential: Get "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=6bb │
│ 900ae-cf97-43ca-bd26-acbc44a9fe51&resource=https%3A%2F%2Ftst-uks-appconf.azconfig.io": context deadline exceeded
is here somewhere else i'd find some interesting logging?
Is this the only workload that got this issue? Do you have other workloads that are also using ManagedIdentity credential on this node, if you do, are they working well?
At the meanwhile , are you able to ping the 169.254.169.254 from the node to see if it is reachable?
yeah, we have a fair few that use MI on the same clusters. all work fine.
i dont know that you'd expect to be able to ping 169.254, as its the AIMS host? eash to check though
i actually cant shell onto the app config pod, so can't test it.
Could you try this to debug, to see if you can get it through?
Run azure cli container
kubectl run azurecli --image=mcr.microsoft.com/azure-cli --restart=Never -- /bin/sh -c "sleep 3600"
Jump into the azurecli pod
kubectl exec -ti azurecli -c azurecli -- /bin/bash
Login with the UAI
az login --identity --username <ClientId of yout UAI> --allow-no-subscriptions
Get the access token
az account get-access-tokn
fails at step 3 (az login) with
MSI endpoint is not responding. Please make sure MSI is configured correctly.
Error detail: MSI: Failed to acquire tokens after 12 times
which is at least consistent with the error about 169.254.169.254 timeing out. im starting to think this issue lies in routing on the node.
Probably you could try to restart your cluster or VMSS node pool, or you reach out AKS for help.
we've deployed this provider but are struggling to get it working.
we get the following error in the pod logs.
which reads like an issue with the AIMS instance in the node. anyone seen this before?