Open vchiapaikeo opened 4 months ago
Actually, I'm not convinced this additional index will fix things. I'm going to attempt to revise the query instead as that seems to produce consistently better results in the explain plan.
There's a small chance that we may need to add an index hint to the query as well. Reopening the PR and we can come back to adding that index hint in the query if we need to
@vchiapaikeo i see a similar slowdown on postgres, though not to the same extent. the query is taking on average around 300ms in our installation
Apache Airflow version
2.9.2
If "Other Airflow 2 version" selected, which one?
No response
What happened?
We are planning to upgrade from 2.7.3 to 2.9.2. However, we've observed issues in our sandbox MySQL db while doing so. Specifically, this query fails to use the ti_trigger_id index on the task_instance table. This is likely a result of the addition of the
coalesce(TaskInstance.priority_weight, 0).desc()
which forces MySQL to need to perform lookups after the fact and decide that the index is not the most optimal route.Slow Query Log (showing that this query takes over 11s to run):
What you think should happen instead?
The index on trigger_id should be modified to include priority_weight so that the query chooses to use the index. Currently, the index is defined as
However, we believe it should be defined as the following so that the index gets used:
How to reproduce
Run Airflow w/ a MySQL backend and turn on slow query logging. Airflow should be bootstrapped with a signficant number of tasks in the task_instance table
Operating System
Debian 11
Versions of Apache Airflow Providers
https://raw.githubusercontent.com/apache/airflow/constraints-2.9.2/constraints-3.11.txt
Deployment
Official Apache Airflow Helm Chart
Deployment details
KubernetesExecutor on GKE
Anything else?
No response
Are you willing to submit PR?
Code of Conduct