Open amrit2196 opened 1 week ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Can you create and post an example DAG to reproduce? I am a bit courious what effect might bring this bug to you. There are maybe hundreds of installations using 2.10 already and it would be a major bug if nobody has detected this, a moment before we release 2.10.2.
Can you tell which executor you are using?
We are using kubernetes executor , but for task pod deletion we run a cronjob to delete task pods, which was working fine in 2.5.3, but not in this one
We are currently running a simple tag with multiple tasks with sleep and checking a get request in each tasks
So to be able to understand this - and most likely it is something in the environment - I request that you inspect the scheduler logs. In recent versions there should be logs emitted when the scheduler is at the parallelism limit. Can you check for this?
Can you also please post an example DAG with which it is easy to reproduce? Then we could test it as regression.
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.10.0
What happened?
I recently upgraded my airflow version from 2.5.3 to 2.10.0 in our environment, and the parallelism count is set to 32 with three schedulers in place, so what happens is that when more than 96 tasks run, whenever a new task is scheduled after that, it gets stuck in scheduled state, with the open slot count being zero, even though the previous tasks that ran have completed and have been cleared.
What you think should happen instead?
The open slot count should increase when the tasks are completed and the tasks queued up should be scheduled
How to reproduce
Just tried it by upgrading the changes and running 5 or 6 dags with 10 task in each dag and parallelism set to 32 for each scheduler. Point to be noted is that the same set of dag works fine when it was running in airflow version 2.5.3
Operating System
Redhat linux
Versions of Apache Airflow Providers
apache-airflow-providers-postgres==5.12.0 \ apache-airflow-providers-apache-hive==8.2.0 \ apache-airflow-providers-amazon==8.28.0 \ apache-airflow-providers-cncf-kubernetes==8.4.1 \ apache-airflow-providers-apache-livy==3.9.0 \ apache-airflow-providers-presto==5.6.0 \ apache-airflow-providers-http==4.13.0 \ apache-airflow-providers-trino==5.8.0 \ apache-airflow-providers-snowflake==5.7.0 \ apache-airflow-providers-salesforce==5.8.0 \ apache-airflow-providers-papermill==3.8.0 \ apache-airflow-providers-google==10.22.0 \ apache-airflow-providers-celery==3.8.1 \ apache-airflow-providers-redis==3.8.0 \ apache-airflow-providers-dbt-cloud==3.10.0 \ apache-airflow-providers-openlineage==1.11.0 \
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct