Using selectin for relationship between TaskInstance and Trigger
-> Nested loop inner join (cost=8.41 rows=8) (actual time=0.0409..0.0409 rows=0 loops=1)
-> Index range scan on task_instance using ti_trigger_id over (trigger_id = 109) OR (trigger_id = 110) OR (6 more), with index condition: (task_instance.trigger_id in (116,115,114,113,112,111,110,109)) (cost=5.61 rows=8) (actual time=0.0402..0.0402 rows=0 loops=1)
-> Single-row index lookup on dag_run_1 using dag_run_dag_id_run_id_key (dag_id=task_instance.dag_id, run_id=task_instance.run_id) (cost=0.263 rows=1) (never executed)
From above Explain Analyze results, we can see that using selectinload is gives more optimal performance for triggerer process as well as the triggerview list api.
closes: #33647
As mentioned by @arunravimv in #33647, we have added this patch to our own Airflow deployment and have noticed improvements in triggerer performance.
Following are the
Explain Analyze
outputs for the two SQL Alchemy relationship loading strategiesTriggerer Process
Using
joinedload
in bulk_fetch methodUsing
selectinload
in bulk_fetch methodtriggerview/list API
Using
joined
for relationship between TaskInstance and TriggerUsing
selectin
for relationship between TaskInstance and TriggerFrom above
Explain Analyze
results, we can see that usingselectinload
is gives more optimal performance for triggerer process as well as the triggerview list api.