Provider socket does not exist if provider pod starts before driver pod

mf-jhellman commented 6 months ago

Describe the bug It appears that sometimes on our nodes the secrets-store-csi-driver-provider-aws pod will start before the secrets-store-csi-driver pod. When that happens, our application pods that mount AWS secrets on such a node will get stuck in ContainerCreating status and the secrets-store-csi-driver pod generates logs such as:

"failed to mount secrets store object content" err="rpc error: code = Canceled desc = latest balancer error: connection error: desc = \"transport: Error while dialing: dial unix /etc/kubernetes/secrets-store-csi-providers/aws.sock: connect: no such file or directory\"" pod="mynamespace/mypod"

The secrets-store-csi-driver-provider-aws does not generate any logs which indicate an issue.

To Reproduce

Steps to reproduce the behavior: Start the secrets-store-csi-driver-provider-aws pod will before the secrets-store-csi-driver pod and create an application pod that mounts AWS secrets.

Do you also notice this bug when using a different secrets store provider (Vault/Azure/GCP...)? Yes/No Haven't tried

If yes, the issue is likely with the k8s Secrets Store CSI driver, not the AWS provider. Open an issue in that repo.

Expected behavior The application pod mounts the secrets

Environment: OS, Go version, etc. EKS 1.29.1, secrets-store-csi-driver 1.4.2, secrets-store-csi-driver-provider-aws 1.0.r2-68-gab548b3-2024.03.20.21.58

Additional context We noticed the secrets-store-csi-driver-provider-aws chart does not include type: DirectoryOrCreate in the providervol volume definition. Is it possible when this hostpath doesn't exist on startup that the provider won't be able to use the socket but not generate any errors?

lobanov commented 5 months ago

Encountered this issue too. I was migrating from YAML-based deployment of ASCP as described here to Helm Chart, and secret mounts stopped working. As a workaround switched back to using the manifest.

simonmarty commented 3 months ago

One way to fix this would be to add a dependencies block to the Helm chart to ensure the CSI driver always deploys before the AWS provider. This is something we originally elected against in order to reduce the number of manual version bumps we need to conduct on this project. I think it's time we revisit this.

aws / secrets-store-csi-driver-provider-aws

Provider socket does not exist if provider pod starts before driver pod #345