Setup
We are self-hosting Airflow via Helm Chart. We use google's CloudSQL as our Airflow metadata store.
Issue
We observe that the below query has been called many times, causes database locks and is causing high CPU utilisation
UPDATE
dag_run
SET
last_scheduling_decision=$1::timestamptz,
updated_at=$2::timestamptz
WHERE
dag_run.id = $3
For instance, referring to the below screenshot, in 1 day, 193,362 times has been called.
Spec of the CLoudSQL instance
The CLoudSQL instance has 4 vCPUs and 15GB ram.
What you think should happen instead?
Updates to the dag_run on last_scheduling_decision should be reduced (or configurable?)
So we can test if database CPU utilisation can be reduced and Database locking problem can be resolved.
How to reproduce
Not easily reproducible.
==========
At the time of the UPDATE SQLs, I observe that only 5 instances of DAG runs are running.
We have a similar setup in another environment where the other CloudSQL instance also has only 4 vCPUs.
In the other environment, however, the number of times the UPDATE SQL is called is smaller. Also no huge database locks are observed in the other environment.
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.9.1
What happened?
Setup We are self-hosting Airflow via Helm Chart. We use google's CloudSQL as our Airflow metadata store.
Issue
We observe that the below query has been called many times, causes database locks and is causing high CPU utilisation
For instance, referring to the below screenshot, in 1 day, 193,362 times has been called.
Spec of the CLoudSQL instance
The CLoudSQL instance has 4 vCPUs and 15GB ram.
What you think should happen instead?
Updates to the dag_run on last_scheduling_decision should be reduced (or configurable?)
So we can test if database CPU utilisation can be reduced and Database locking problem can be resolved.
How to reproduce
Not easily reproducible.
==========
At the time of the UPDATE SQLs, I observe that only 5 instances of DAG runs are running.
We have a similar setup in another environment where the other CloudSQL instance also has only 4 vCPUs.
In the other environment, however, the number of times the UPDATE SQL is called is smaller. Also no huge database locks are observed in the other environment.
Operating System
linux
Versions of Apache Airflow Providers
No response
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct