Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 309 forks source link

[BUG] Dapr extension can't install latest version of Dapr on Arm-based node #3992

Open vlardn opened 1 year ago

vlardn commented 1 year ago

Dapr extension for AKS can't install latest Dapr on Arm-based node Standard_D2pds_v5 due to the error in Dapr monitoring pod "mdm" container: exec /start_metricsextension.sh: exec format error

It looks like that 'mdm' docker container (linuxgeneva-microsoft.azurecr.io/genevamdm:2.2023.928.2134-0de476-20230928t2244 in this case) was not built for linux/arm64 platform.

Steps to Reproduce:

az aks create --resource-group $RS \
    --name $NAME \
    --kubernetes-version 1.27.3 \
    --node-vm-size Standard_D2pds_v5
az k8s-extension create --resource-group $RG \
    --cluster-type managedClusters  \
    --cluster-name $NAME \
    --extension-type Microsoft.Dapr \
    --name dapr \
    --auto-upgrade-minor-version false \
    --version 1.12.0
~$ kubectl get pod -n dapr-system
NAME                                       READY   STATUS             RESTARTS        AGE
pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb   3/4     CrashLoopBackOff   5 (29s ago)    3m27s
pod/dapr-monitoring-qsphs                      2/3     CrashLoopBackOff   5 (24s ago)    3m27s
pod/dapr-operator-86df5d7f89-p9hxd             1/1     Running            1 (3m8s ago)   3m27s
pod/dapr-placement-server-0                    1/1     Running            1 (3m2s ago)   3m27s
pod/dapr-sentry-566cbc6454-flfdb               1/1     Running            0              3m27s
pod/dapr-sidecar-injector-7fb747486d-8642t     1/1     Running            0              3m27s
~$ kubectl logs dapr-monitoring-metrics-7b9556d5cf-vt6nb -n dapr-system -c mdm
exec /start_metricsextension.sh: exec format error
~$ kubectl get ev -n dapr-system --sort-by='.lastTimestamp' | grep dapr-monitoring-metrics-7b9556d5cf-vt6nb
56m         Normal    Scheduled           pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Successfully assigned dapr-system/dapr-monitoring-metrics-7b9556d5cf-vt6nb to aks-nodepool1-18147971-vmss000000
56m         Normal    SuccessfulCreate    replicaset/dapr-monitoring-metrics-7b9556d5cf   Created pod: dapr-monitoring-metrics-7b9556d5cf-vt6nb
56m         Normal    Pulling             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Pulling image "linuxgeneva-microsoft.azurecr.io/genevamdm:2.2023.928.2134-0de476-20230928t2244"
56m         Normal    Pulled              pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Successfully pulled image "linuxgeneva-microsoft.azurecr.io/genevamdm:2.2023.928.2134-0de476-20230928t2244" in 9.468081088s (10.695860176s including waiting)
56m         Normal    Created             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Created container addon-token-adapter
56m         Normal    Pulled              pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Container image "mcr.microsoft.com/aks/msi/addon-token-adapter:master.230804.1" already present on machine
56m         Normal    Started             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Started container addon-token-adapter
56m         Normal    Pulling             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Pulling image "mcr.microsoft.com/daprio/metrics:v0.5"
56m         Normal    Pulled              pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Successfully pulled image "mcr.microsoft.com/daprio/metrics:v0.5" in 3.850984634s (12.498798172s including waiting)
56m         Normal    Created             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Created container dapr-metrics
56m         Normal    Started             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Started container dapr-metrics
56m         Normal    Pulling             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Pulling image "mcr.microsoft.com/mirror/docker/library/telegraf:1.28"
55m         Normal    Created             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Created container telegraf
55m         Normal    Pulled              pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Successfully pulled image "mcr.microsoft.com/mirror/docker/library/telegraf:1.28" in 6.902665975s (16.704340974s including waiting)
55m         Normal    Started             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Started container telegraf
55m         Normal    Pulled              pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Container image "linuxgeneva-microsoft.azurecr.io/genevamdm:2.2023.928.2134-0de476-20230928t2244" already present on machine
55m         Normal    Created             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Created container mdm
55m         Normal    Started             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Started container mdm
46m         Warning   BackOff             pod/dapr-monitoring-metrics-7b9556d5cf-vt6nb    Back-off restarting failed container mdm in pod dapr-monitoring-metrics-7b9556d5cf-vt6nb_dapr-system(581e7872-834c-4066-b384-e82430d51e80)

PS. Dapr installation via AKS extension succeeds on similar Standard_D2ads_v5 node but with AMD CPU (as the "mdm" container is not failing there):

~$ kubectl logs dapr-monitoring-metrics-7b9556d5cf-qdcgn -n dapr-system -c mdm
+ [[ -z '' ]]
+ [[ -z '' ]]
+ export CERT_FILE=/tmp/geneva_mdm/mdm-cert.pem
+ CERT_FILE=/tmp/geneva_mdm/mdm-cert.pem
...

Environment:

vlardn commented 5 months ago

Guys, no any response for 7 months now :(

vlardn commented 3 months ago

Guys, no any response for 9 months now :)

speters82 commented 1 month ago

I've reproduced this. While a proper fix is worked on, you can unblock yourself by setting dapr_monitoring.enabled to false.

// Dapr monitoring is not supported on ARM VMs due to the mdm container not being ARM64 compatible.
// Removing this once that bug is fixed. https://github.com/Azure/AKS/issues/3992
var daprMonitoringEnabled = contains(systemVmType, 'p') ? 'false' : 'true'

resource dapr 'Microsoft.KubernetesConfiguration/extensions@2023-05-01' = {
  scope: aksCluster
  name: 'dapr'
  properties: {
    extensionType: 'Microsoft.Dapr'
    version: '1.14.4-msft.5'
    autoUpgradeMinorVersion: false // We need to pin the version to avoid breaking changes
    releaseTrain: 'stable'
    scope: {
      cluster: {
        releaseNamespace: 'dapr-system'
      }
    }
    configurationSettings: {
      'global.nodeSelector.agentpool': 'systempool'
      'dapr_monitoring.enabled': daprMonitoringEnabled
    }
  }
}