Closed jayon-niravel closed 1 year ago
This sounds like EFS/EKS problem, not airflow. I think you should look for similar issues (I saw there are plenty of airflow-unrelated issues raised with EFS/EKS and other apps having similar issues. From a quick look it's likely a configuration of networking or EFS resources that need to be able to handle that many mounts, but I think it's best if you look for similar issues or raise the issue to AWS support.
Converting to discussion, in case more discussion is needed as it seems a deployment-specific troubleshooting, not airflow issue.
BTW. Comment for the future It would be great to explain the difference you have in your configuration vs. standard or maybe add your specific code in the [collapsible section of markdown])https://gist.github.com/pierrejoubert73/902cc94d79424356a8d20be2b382e1ab) - the more easy you make for someone who tries to help you to understand your issue, the better (remember people here try to help when they can in their free time) - so by making it easier to see what the problem is, you increase your chances that someon will help you to solve your problem, In this case it took me quite some time time scroll many pages of the configuration (which maybe was useful but impossible to analyse in full by human) in order to find what the problem is.
Official Helm Chart version
1.10.0 (latest released)
Apache Airflow version
2.6.0
Kubernetes Version
Client Version: v1.25.2 Kustomize Version: v4.5.7 Server Version: v1.23.17+16bcd69
Helm Chart configuration
Docker Image customizations
What happened
Dags/test_parallelism.py
If I run below DAG with AWS EFS volume mounted then it works for pod counts until 25 with no issues. But if I increase the pods count to 100 then I start getting the timeout issues.
Unable to attach or mount volumes: unmounted volumes=[logs], unattached volumes=[logs config backups kube-api-access-jxz9w]: timed out waiting for the condition
Unable to attach or mount volumes: unmounted volumes=[logs], unattached volumes=[backups kube-api-access-q6b8x logs config]: timed out waiting for the condition
Dags/test_parallelism.py
What you think should happen instead
The EFS volume should be mounted for the Kubernetes pods since the access mode is set to ReadWriteMany
How to reproduce
Use the mentioned helm chart/airflow version and custom-values.yaml template. Mount AWS EFS volume for persistent logs and also one custom volume to all the kubernetes pod.
Volume mount 1- airflow-logs
Volume mount 2- backups via the POD Override
Anything else
kubectl logs for one of unmounted pods:
Are you willing to submit PR?
Code of Conduct