apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.76k stars 13.91k forks source link

DAGs with future `start_date` are automatically marked as success without executing tasks #38513

Open eladkal opened 4 months ago

eladkal commented 4 months ago

Body

For the following DAG:

from datetime import datetime

from airflow import DAG
from airflow.operators.bash import BashOperator

default_args = {
    'owner': 'airflow',
}

with DAG('my_future_dag',
         default_args=default_args,
         start_date=datetime(2050, 1, 1),
         schedule="@daily",
         catchup=False
         ):
    BashOperator(task_id="task1", bash_command="echo 1")
    BashOperator(task_id="task2", bash_command="echo 2")
    BashOperator(task_id="task3", bash_command="echo 3")
    BashOperator(task_id="task4", bash_command="echo 4")

As expected no runs will be started till 2050 but I might still want to manually invoke runs. Trying to do so will result in DAGRun marked as success and no tasks being executed as can see

Screenshot 2024-03-26 at 22 22 42

However changing the start_date to datetime(2023, 1, 1) then tasks are executed as expected for manual runs: Screenshot 2024-03-26 at 22 24 28

The bug: tasks should be executed for manual runs. The current case where dagrun is marked as success but no tasks are running or reporting failure is wrong.

By the way, a very interesting thing to explore is that if you first deploy the dag with datetime(2023, 1, 1) then change it to datetime(2050, 1, 1) Airflow has no problem with running tasks with that start_date:

Screenshot 2024-03-26 at 22 28 19

Screenshot 2024-03-26 at 22 29 36

Committer

tirkarthi commented 4 months ago

https://forum.astronomer.io/t/dag-run-marked-as-success-but-no-tasks-even-started/1423 https://stackoverflow.com/q/73652663

We usually have users using dynamic start_date which causes the issue. We point them to these links. It will be helpful if these dagruns are blocked from marked as success.

eladkal commented 4 months ago

Dynamic start date is wrong usage. The case that I show here is valid.

amoghrajesh commented 4 months ago

@eladkal I want to try my hand at this one, seems interesting. Can I take it up?

eladkal commented 4 months ago

Sure. Assigned @amoghrajesh

artilexx commented 4 weeks ago

@amoghrajesh have you been able to figure out what is causing this?

artilexx commented 4 weeks ago

regarding the note about the tasks running if you deploy the dag with a past start_date and change it to a future start_date, it will run the tasks properly for a while but then revert to the old behaviour after manually triggering the dag a few times

side note: same issue behaviour when manual run is triggered after end_date