apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.12k stars 14.31k forks source link

Expensive Trigger Query Does Not Use Index (even after forcing a hint) Due to Additional Column #40674

Open vchiapaikeo opened 4 months ago

vchiapaikeo commented 4 months ago

Apache Airflow version

2.9.2

If "Other Airflow 2 version" selected, which one?

No response

What happened?

We are planning to upgrade from 2.7.3 to 2.9.2. However, we've observed issues in our sandbox MySQL db while doing so. Specifically, this query fails to use the ti_trigger_id index on the task_instance table. This is likely a result of the addition of the coalesce(TaskInstance.priority_weight, 0).desc() which forces MySQL to need to perform lookups after the fact and decide that the index is not the most optimal route.

Slow Query Log (showing that this query takes over 11s to run):

# Query_time: 11.081314  Lock_time: 0.000299 Rows_sent: 0  Rows_examined: 2819837
SET timestamp=1720544726;
SELECT `trigger`.id 
FROM `trigger` INNER JOIN task_instance ON `trigger`.id = task_instance.trigger_id 
WHERE `trigger`.triggerer_id IS NULL OR (`trigger`.triggerer_id NOT IN (SELECT job.id 
FROM job 
WHERE job.end_date IS NULL AND job.latest_heartbeat > '2024-07-09 17:04:56.519611' AND job.job_type = 'TriggererJob')) ORDER BY coalesce(task_instance.priority_weight, 0) DESC, `trigger`.created_date 
 LIMIT 9968 FOR UPDATE SKIP LOCKED;

What you think should happen instead?

The index on trigger_id should be modified to include priority_weight so that the query chooses to use the index. Currently, the index is defined as

CREATE INDEX ti_trigger_id
   ON task_instance (trigger_id ASC);

However, we believe it should be defined as the following so that the index gets used:

CREATE INDEX idx_trigger_priority ON task_instance(trigger_id, priority_weight);

How to reproduce

Run Airflow w/ a MySQL backend and turn on slow query logging. Airflow should be bootstrapped with a signficant number of tasks in the task_instance table

Operating System

Debian 11

Versions of Apache Airflow Providers

https://raw.githubusercontent.com/apache/airflow/constraints-2.9.2/constraints-3.11.txt

Deployment

Official Apache Airflow Helm Chart

Deployment details

KubernetesExecutor on GKE

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

vchiapaikeo commented 4 months ago

Actually, I'm not convinced this additional index will fix things. I'm going to attempt to revise the query instead as that seems to produce consistently better results in the explain plan.

vchiapaikeo commented 4 months ago

There's a small chance that we may need to add an index hint to the query as well. Reopening the PR and we can come back to adding that index hint in the query if we need to

mattp- commented 2 days ago

@vchiapaikeo i see a similar slowdown on postgres, though not to the same extent. the query is taking on average around 300ms in our installation