Error creating config map

ShortMVitesse commented 7 months ago

we've deployed this provider but are struggling to get it working.

we get the following error in the pod logs.

Fail to create the target ConfigMap or Secret of AzureAppConfigurationProvider 'appconfig-portal-insights' in 'vnext' namespace: ManagedIdentityCredential: ManagedIdentityCredential: Get "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=f0ad9467-979b-4fa0-9235-3798532c828b&resource=https%3A%2F%2Ftst-uks-appconf.azconfig.io": context deadline exceeded

which reads like an issue with the AIMS instance in the node. anyone seen this before?

RichardChen820 commented 7 months ago

What is AIMS account for? Looks like you are using managed identity, could you share the yaml you are using?

ShortMVitesse commented 7 months ago

im sure you know this better than me, but just in case. AIMS = Azure Instance Metadata Service and its the miniature website hosted on the 169 address in the error. http://169.254.169.254/metadata/identity/oauth2/token

apiVersion: azconfig.io/v1
kind: AzureAppConfigurationProvider
metadata:
  name: appconfig-portal-insights
  namespace: vnext
spec:
  endpoint: https://blah-blah-appconf.azconfig.io
  target:
    configMapName: configmap-portal-insights
    configMapData: 
      type: json
      key: mysettings.json  
  auth:
    managedIdentityClientId: f0ad9467-1234-4fa0-9235-3798532c828b
  configuration:
    selectors:
      - keyFilter: '*'
        labelFilter: service-portal-insights

RichardChen820 commented 6 months ago

I'm not able to reproduce the issue on my side, have you enabled pod-managed identity or wokload identity on your cluster?

ShortMVitesse commented 6 months ago

we're not ready to enable workload identity yet, so we have to get this working with the managed identity for now.

RichardChen820 commented 6 months ago

Do you mind re-installing it with more detailed log verbosity to see if it could help with root cause the issue?

helm install azureappconfiguration.kubernetesprovider \
     oci://mcr.microsoft.com/azure-app-configuration/helmchart/kubernetes-provider \
     --namespace azappconfig-system \
     --create-namespace \
     --set logVerbosity=3

ShortMVitesse commented 6 months ago

would love to. can you help me find the reference for a helm chart to deploy it, as thats how we do it. not sure where to pass that log level in a helm chart.

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: azure-app-configuration
  namespace: azappconfig-system
spec:
  releaseName: azure-app-configuration
  chart:
    spec:
      chart: kubernetes-provider
      sourceRef:
        name: azure-app-configuration
        kind: HelmRepository
        namespace: flux-system
  interval: 10m
  install:
    crds: Create
  upgrade:
    crds: CreateReplace
logverbosity: 3

RichardChen820 commented 6 months ago

Could you try:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: azure-app-configuration
  namespace: azappconfig-system
spec:
  releaseName: azure-app-configuration
  chart:
    spec:
      chart: kubernetes-provider
      sourceRef:
        name: azure-app-configuration
        kind: HelmRepository
        namespace: flux-system
  interval: 10m
  install:
    crds: Create
  upgrade:
    crds: CreateReplace
   values:
      logverbosity: 3

ShortMVitesse commented 6 months ago

i have done, but i dont see any upgraded logging. still just this coming from the pod

 E0314 16:30:36.149902       1 appconfigurationprovider_controller.go:264] Fail to create the target ConfigMap or Secret of AzureAppConfigurationProvider 'appconfig-portal-insig │
│ hts' in 'vnext' namespace: ManagedIdentityCredential: ManagedIdentityCredential: Get "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=6bb │
│ 900ae-cf97-43ca-bd26-acbc44a9fe51&resource=https%3A%2F%2Ftst-uks-appconf.azconfig.io": context deadline exceeded

is here somewhere else i'd find some interesting logging?

RichardChen820 commented 6 months ago

Is this the only workload that got this issue? Do you have other workloads that are also using ManagedIdentity credential on this node, if you do, are they working well?

At the meanwhile , are you able to ping the 169.254.169.254 from the node to see if it is reachable?

ShortMVitesse commented 6 months ago

yeah, we have a fair few that use MI on the same clusters. all work fine.

i dont know that you'd expect to be able to ping 169.254, as its the AIMS host? eash to check though

ShortMVitesse commented 6 months ago

i actually cant shell onto the app config pod, so can't test it.

RichardChen820 commented 6 months ago

Could you try this to debug, to see if you can get it through?

Run azure cli container

kubectl run azurecli --image=mcr.microsoft.com/azure-cli --restart=Never -- /bin/sh -c "sleep 3600"

Jump into the azurecli pod

kubectl exec -ti azurecli -c azurecli -- /bin/bash

Login with the UAI

az login --identity --username <ClientId of yout UAI> --allow-no-subscriptions

Get the access token
```
az account get-access-tokn
```

ShortMVitesse commented 6 months ago

fails at step 3 (az login) with

MSI endpoint is not responding. Please make sure MSI is configured correctly.
Error detail: MSI: Failed to acquire tokens after 12 times

which is at least consistent with the error about 169.254.169.254 timeing out. im starting to think this issue lies in routing on the node.

RichardChen820 commented 6 months ago

Probably you could try to restart your cluster or VMSS node pool, or you reach out AKS for help.

Azure / AppConfiguration-KubernetesProvider

Error creating config map #19