Open mrn-aglic opened 1 year ago
What I see is deadlock:
[2023-09-14T10:41:55.788+0000] {backfill_job_runner.py:686} WARNING - Deadlock discovered for ti_status.to_run=dict_values([<TaskInstance: stale_data_test.insert_new_daily_data.insert backfill__2023-07-30T06:00:00+00:00 map_index=0 [scheduled]>, <TaskInstance: stale_data_test.insert_new_daily_data.insert backfill__2023-07-22T06:00:00+00:00 map_index=0 [scheduled]>, <TaskInstance: stale_data_test.insert_new_daily_data.insert backfill__2023-07-19T06:00:00+00:00 map_index=0 [scheduled]>])
[2023-09-14T10:41:55.874+0000] {backfill_job_runner.py:416} INFO - [backfill progress] | finished run 0 of 31 | tasks waiting: 0 | succeeded: 16 | running: 0 | failed: 0 | skipped: 0 | deadlocked: 3 | not ready: 3
[2023-09-14T10:41:55.889+0000] {local_executor.py:402} INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 33, in <module>
sys.exit(load_entry_point('apache-airflow', 'console_scripts', 'airflow')())
File "/opt/airflow/airflow/__main__.py", line 59, in main
args.func(args)
File "/opt/airflow/airflow/cli/cli_config.py", line 49, in command
return func(*args, **kwargs)
File "/opt/airflow/airflow/utils/cli.py", line 114, in wrapper
return f(*args, **kwargs)
File "/opt/airflow/airflow/utils/providers_configuration_loader.py", line 55, in wrapped_function
return func(*args, **kwargs)
File "/opt/airflow/airflow/cli/commands/dag_command.py", line 153, in dag_backfill
_run_dag_backfill(dags, args)
File "/opt/airflow/airflow/cli/commands/dag_command.py", line 105, in _run_dag_backfill
dag.run(
File "/opt/airflow/airflow/models/dag.py", line 2671, in run
run_job(job=job, execute_callable=job_runner._execute)
File "/opt/airflow/airflow/utils/session.py", line 79, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/airflow/airflow/jobs/job.py", line 305, in run_job
return execute_job(job, execute_callable=execute_callable)
File "/opt/airflow/airflow/jobs/job.py", line 334, in execute_job
ret = execute_callable()
File "/opt/airflow/airflow/utils/session.py", line 79, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/airflow/airflow/jobs/backfill_job_runner.py", line 949, in _execute
raise BackfillUnfinished(err, ti_status)
airflow.exceptions.BackfillUnfinished: BackfillJob is deadlocked.
These tasks have succeeded:
DAG ID Task ID Run ID Try number
--------------- ------------ ----------------------------------- ------------
stale_data_test get_run_data backfill__2023-07-16T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-17T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-18T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-19T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-20T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-21T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-22T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-23T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-24T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-25T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-26T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-27T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-28T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-29T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-30T06:00:00+00:00 1
stale_data_test get_run_data backfill__2023-07-31T06:00:00+00:00 1
These tasks are running:
DAG ID Task ID Run ID Try number
-------- --------- -------- ------------
These tasks have failed:
DAG ID Task ID Run ID Try number
-------- --------- -------- ------------
These tasks are skipped:
DAG ID Task ID Run ID Try number
-------- --------- -------- ------------
These tasks are deadlocked:
DAG ID Task ID Run ID Map Index Try number
--------------- ---------------------------- ----------------------------------- ----------- ------------
stale_data_test insert_new_daily_data.insert backfill__2023-07-19T06:00:00+00:00 1 0
stale_data_test insert_new_daily_data.insert backfill__2023-07-22T06:00:00+00:00 1 0
stale_data_test insert_new_daily_data.insert backfill__2023-07-30T06:00:00+00:00 1 0
Updated the DAG since the above has some issues:
import pendulum
from airflow import DAG
from airflow.decorators import task_group
from airflow.models import DagRun
from airflow.operators.python import PythonOperator
from airflow.utils.types import DagRunType
_QUERY_INTERVAL_START_OFFSET = 14
_QUERY_INTERVAL_END_OFFSET = 2
def _get_start_end_dates(dag_run: DagRun, data_interval_end: pendulum.DateTime):
if dag_run.run_type in [DagRunType.BACKFILL_JOB, DagRunType.MANUAL]:
start_date = data_interval_end.subtract(days=_QUERY_INTERVAL_END_OFFSET).date()
end_date = data_interval_end.subtract(days=_QUERY_INTERVAL_END_OFFSET).date()
return [
{
"start_date": start_date.isoformat(),
"end_date": end_date.isoformat(),
}
]
return [
{
"start_date": data_interval_end.subtract(days=i).date().isoformat(),
"end_date": data_interval_end.subtract(days=i).date().isoformat(),
}
for i in range(_QUERY_INTERVAL_END_OFFSET, _QUERY_INTERVAL_START_OFFSET + 1)
]
def _get_insert_run_data(
dag_run: DagRun,
data_interval_end: pendulum.DateTime,
):
current_date = data_interval_end.date().isoformat()
return [
{"current_date": current_date, **dates}
for dates in _get_start_end_dates(dag_run, data_interval_end)
]
def _print(start_date: str, end_date: str, current_date: str):
print(f"start_date: {start_date}")
print(f"end_date: {end_date}")
print(f"current_date: {current_date}")
with DAG(
dag_id="stale_data_test",
catchup=False,
start_date=pendulum.datetime(2023, 6, 7),
schedule="0 6 * * *",
):
get_run_data = PythonOperator(
task_id="get_run_data",
python_callable=_get_insert_run_data,
)
@task_group(group_id="insert_new_daily_data")
def insert_new_daily_data(start_date: str, end_date: str, current_date: str):
cleanup = PythonOperator(
task_id="cleanup",
python_callable=_print,
op_kwargs={
"start_date": start_date,
"end_date": end_date,
"current_date": current_date,
},
)
insert = PythonOperator(
task_id="insert",
python_callable=_print,
op_kwargs={
"start_date": start_date,
"end_date": end_date,
"current_date": current_date,
},
)
cleanup >> insert
insert_new_daily_data.expand_kwargs(kwargs=get_run_data.output)
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
This issue happens in Airflow 2.6.1 with Postgres 13.
I have enabled the DAG, and the scheduled run executed successfully. However, backfilling causes a variety of issues. Here is the initial command I ran:
airflow dags backfill stale_data_test --start-date 2023-07-01 --end-date 2023-08-01 -B
Sometimes I get the following:
Re-running the command, backfilling finished successfully. However, running for a different start and end date I got the issue again.
When running the backfill with this command:
airflow dags backfill stale_data_test --start-date 2023-05-01 --end-date 2023-06-01 -B
I have got this issue:This second issue might be related to this.
I have also tried to limit the number of
max_active_dag_runs
to 8, but then I got a deadlock (didn't reproduce at time of writing). Probably related to the issue here.What you think should happen instead
The backfill should finish successfully without any issues.
How to reproduce
To reproduce I have made a simple DAG:
Then I have bashed into the docker container and run the following commands (after waiting each finished or interrupting in case of specific errors): 1 airflow dags backfill stale_data_test --start-date 2023-07-01 --end-date 2023-08-01 -B 2 airflow dags backfill stale_data_test --start-date 2023-07-01 --end-date 2023-08-01 -B 3 airflow dags backfill stale_data_test --start-date 2023-06-01 --end-date 2023-07-01 -B 4 airflow dags backfill stale_data_test --start-date 2023-05-01 --end-date 2023-06-01 -B
Try for more dates or re-running with
--reset-dagruns -y
.Operating System
OS X (Linux in Docker)
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
Docker: Engine: 24.0.2 Compose: v2.19.1
Docker desktop: Version 4.21.1 (114176)
Astro CLI Version: 1.17.1
Anything else
These problems occur regularly.
Are you willing to submit PR?
Code of Conduct