Open MatthewStrickland opened 2 weeks ago
I have a feeling schedule_after_task_execution may be the reason. Could you try turning that off and confirming?
+1 in the assumption. Post task executions the next tasks are scheduled. Thus they might get immediately scheduled while the scheduler running centrally sees limits early and does not schedule other stuff.
Also can you please post an example DAG and check if the same applies to Airflow 2.10 as well?
I would judge it a bit that scheduling is mostly not forcing a priority and does not attempt NOT to schedule bust attempts to schedule in a loop using best-effort what is schedule-able. The schedule_after_task_execution
might be contributing to the best effort fact with the benefit of reduced latency between tasks.
This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.8.1
What happened?
Task with higher priority (when priority weight upstream is set) not selected to run when different DAGs are racing each other
What you think should happen instead?
Once a task has finished the executor should give a chance for the tasks of other DAGs to queue up before choosing the next task to run. (I'm assuming this is the problem why the executor sometimes chooses the wrong task to run). And/or the order of tasks should be somewhat consistent/deterministic.
Fwiw, I don't ever see this issue inside a single DAG where dagruns are racing each other, just between dagruns of different DAGs. There's no mention in the docs that this feature is only consistent between dagruns of the same DAG. It's a feature coupled with pools I believe, which work across DAGs, so I assumed this would work across DAGs too.
How to reproduce
Create 2 dags (
dag 1 & 2
) and set priority weight upstream on both Create a pool Have 5 tasks each, where both use the same pool fortasks 3 & 4
Havetask 3
take a long time (sleep) Run both dags togetherEventually:
dag 1
is runningtask 3
anddag 2
has queuedtask 3
Sometimes (and desired):dag 1
task 4
runs beforedag 2
task 3
Sometimes (and undesired):dag 2
task 3
runs beforedag 1
task 4
Operating System
rhel 9
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
So this is very minor and might not even be considered a bug at all, but I found it rather unintuitive, so if it's not classed as an bug I'm ok with that, just information I thought worth sharing.
Are you willing to submit PR?
Code of Conduct