Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.96k stars 306 forks source link

[BUG] azure-keyvault-secrets-provider pod is failing #3372

Closed nicklev closed 1 year ago

nicklev commented 1 year ago

Describe the bug I used the az aks enable-addons -a azure-keyvault-secrets-provider command to add AKV secrets provider. The deployment used to work fine until there was an outage to my cluster and I had to Stop & Start AKS. After restarting the AKS the pods started deploying in the nodes. Then I saw that 3 out of 4 aks-secrets-store-csi-driver pods failed.

In the logs of the failed containers I see these messages:

node-driver-registrar: I1201 13:45:08.626379 1 main.go:166] Version: v2.5.1-0-ga31bf169 I1201 13:45:08.626433 1 main.go:167] Running node-driver-registrar in mode=registration I1201 13:45:08.626987 1 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock" I1201 13:45:08.627032 1 connection.go:154] Connecting to unix:///csi/csi.sock W1201 13:45:18.627662 1 connection.go:173] Still connecting to unix:///csi/csi.sock

secrets-store container: I1205 09:26:19.219972 1 exporter.go:35] "initializing metrics backend" backend="prometheus" I1205 09:26:19.221577 1 main.go:179] starting manager I1205 09:26:19.221752 1 shared_informer.go:285] caches populated I1205 09:26:19.221932 1 reflector.go:219] Starting reflector v1.CSIDriver (10m0s) from pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167 I1205 09:26:19.221995 1 reflector.go:255] Listing and watching v1.CSIDriver from pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167 W1205 09:26:49.222905 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167: failed to list v1.CSIDriver: Get "https://aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io:443/apis/storage.k8s.io/v1/csidrivers?fieldSelector=metadata.name%3Dsecrets-store.csi.k8s.io&limit=500&resourceVersion=0": dial tcp: lookup aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io: i/o timeout I1205 09:26:49.223147 1 trace.go:205] Trace[1508814203]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167 (05-Dec-2022 09:26:19.222) (total time: 30001ms): Trace[1508814203]: ---"Objects listed" error:Get "https://aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io:443/apis/storage.k8s.io/v1/csidrivers?fieldSelector=metadata.name%3Dsecrets-store.csi.k8s.io&limit=500&resourceVersion=0": dial tcp: lookup aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io: i/o timeout 30000ms (09:26:49.222) Trace[1508814203]: [30.001105376s] [30.001105376s] END E1205 09:26:49.223241 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167: Failed to watch v1.CSIDriver: failed to list v1.CSIDriver: Get "https://aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io:443/apis/storage.k8s.io/v1/csidrivers?fieldSelector=metadata.name%3Dsecrets-store.csi.k8s.io&limit=500&resourceVersion=0": dial tcp: lookup aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io: i/o timeout I1205 09:26:50.279900 1 reflector.go:255] Listing and watching v1.CSIDriver from pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167 W1205 09:27:09.229009 1 reflector.go:324] pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167: failed to list v1.CSIDriver: Get "https://aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io:443/apis/storage.k8s.io/v1/csidrivers?fieldSelector=metadata.name%3Dsecrets-store.csi.k8s.io&limit=500&resourceVersion=0": dial tcp: lookup aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io on 10.0.0.10:53: read udp 10.0.2.229:36542->10.0.0.10:53: i/o timeout E1205 09:27:09.229022 1 secretproviderclasspodstatus_controller.go:97] "failed to patch secret owner ref" err="failed to list secret provider class pod status, err: Get \"https://aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io:443/api?timeout=32s\": dial tcp: lookup aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io on 10.0.0.10:53: read udp 10.0.2.229:36542->10.0.0.10:53: i/o timeout" I1205 09:27:09.229062 1 trace.go:205] Trace[1319515113]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167 (05-Dec-2022 09:26:50.279) (total time: 18949ms): Trace[1319515113]: ---"Objects listed" error:Get "https://aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io:443/apis/storage.k8s.io/v1/csidrivers?fieldSelector=metadata.name%3Dsecrets-store.csi.k8s.io&limit=500&resourceVersion=0": dial tcp: lookup aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io on 10.0.0.10:53: read udp 10.0.2.229:36542->10.0.0.10:53: i/o timeout 18949ms (09:27:09.228) Trace[1319515113]: [18.949110587s] [18.949110587s] END E1205 09:27:09.229095 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167: Failed to watch v1.CSIDriver: failed to list v1.CSIDriver: Get "https://aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io:443/apis/storage.k8s.io/v1/csidrivers?fieldSelector=metadata.name%3Dsecrets-store.csi.k8s.io&limit=500&resourceVersion=0": dial tcp: lookup aks-oe-prod-weu-01-dns-e3c6d3ba.hcp.westeurope.azmk8s.io on 10.0.0.10:53: read udp 10.0.2.229:36542->10.0.0.10:53: i/o timeout I1205 09:27:11.938075 1 reflector.go:255] Listing and watching v1.CSIDriver from pkg/mod/k8s.io/client-go@v0.24.1/tools/cache/reflector.go:167

To Reproduce Run az aks enable-addons -a azure-keyvault-secrets-provider

Expected behavior I expected the aks-secrets-store-csi-driver to be properly deployed without any issues.

Environment (please complete the following information):

andyzhangx commented 1 year ago

it seems azure-keyvault-secrets-provider pod could not access api-server, pls file an azure support ticket? btw, is there any other kube-system pods hitting same issue?

nicklev commented 1 year ago

No, this is the only one.

ghost commented 1 year ago

Action required from @Azure/aks-pm

CocoWang-wql commented 1 year ago

Hello @nicklev did you open a ticket and was the issue resolved~?

nicklev commented 1 year ago

Hi @CocoWang-wql, I opened a ticket but it is not resolved yet.

ghost commented 1 year ago

@Azure/aks-pm issue needs labels

ghost commented 1 year ago

@Azure/aks-pm issue needs labels

CocoWang-wql commented 1 year ago

Hello @nicklev we fixed it and the change is rolling out. Pls let me know whether the support engineer update the status for you. Thank you.

nicklev commented 1 year ago

Hi @CocoWang-wql ,

Nice to hear that. Should I do something to get the latest version? I installed the aks-secrets-store-csi-driver with this command az aks enable-addons --addons azure-keyvault-secrets-provider --name myAKSCluster --resource-group myResourceGroup

CocoWang-wql commented 1 year ago

Once the change is released, user doesn't need any action. If you have any concern, can you reply your support ticket and ask the support engineer to reach 'Coco Wang'? I can help to check further.

ghost commented 1 year ago

@Azure/aks-pm issue needs labels

ghost commented 1 year ago

@Azure/aks-pm issue needs labels

nicklev commented 1 year ago

Hi there, The issue has been resolved.