Open uranusjr opened 2 years ago
One comment to that is (via #19578) that users also use (and quite often) airflow tasks test
to run single tasks for a particular DAG. I think while we could get-by with backfil on dags test
, this will not be the case for tasks test
- and we need to figure out how to pass the data_interval to that comand as well. I think from the docs the intention was that infer_data_interval
for the custom timetable will do that, but it does not seem to be plugged-in currently (and tasks test
with custom timetables fails as it cannot infer the interval).
I think if we solve it for tasks test
, the same solution could be used for dags test
.
One good thing about dags test
also is that it uses "DebugExecutor" and did not leave traces in the DB after it was executed (or so I thought at least - and for sure that was the itention of the dag tests
command. I am not sure if this assumption holds true still - or needs to be 'fixed". But the idea was that you run the whole dag this way, but that it will not be stored in the database. I am not sure if this is something that we want to mix
with backfill - though. For me dags tests
still has a good use here.
Description
Currently (in versions up to 2.1.4),
airflow dags test <dag_id> <execution_date>
creates a backfill run at the specified datetime. This, however, applies regardless of whether the DAG can actually have a logically automated backfill at that specific datetime or not. One example of this logically confusing behaviour is shown in #18473. A DAG withschedule_interval=None
should logically have no backfill runs ever, but thetest
command would still happily create a backfill run at that datetime.With the introduction of custom timetables in AIP-39, the DAG scheduling logic went through some extensive refactoring to conform more closely to the DAG's schedule/timetable specification. This means that a backfill run can no longer be created at will. The 2.2 release will contain a hack to keep the current behaviour of "free" backfill run creation via
test
(#18742), but I would prefer this to be a temporary measure and be removed once we have a better solution.The root cause to this issue is, IMO,
airflow dags test
has very poor semantic as currently designed. It is entirely non-obvious it is creating backfill runs (and a subsequentairflow dags backfill
call would therefore skip the specific datetime if and only if it lies on the logical schedule), nor why a backfill can happen without considering the schedule (it is the only way to do that in Airflow AFAIK). And the nametest
itself is somewhat a misnomer—why is creating a backfill run a test in the first place?Use case/motivation
From what I can tell, the currently primary use case to
airflow dags test
is to check whether a DAG implements the tasks reasonably before it's activated. For this particular use case, the user does not actually care what kind of run is used, so a manual run would do. But we should also create a migration path for those relying onairflow dags test
to create a backfill run, since the implied side effect of saving a backfill run later on is also somewhat useful.So the plan I currently have in mind is:
airflow dags trigger
to allow triggering a manual run and execute it directly in the console (instead of sending it to the scheduler). This will need some new mechanism sincetrigger
is currently implemented byDAG.create_dagrun()
. I think we'll need a new job class e.g.ManualRunJob
.airflow dags backfill
to do the same thing, but with a backfill run. This would cover the exact same use case asairflow dags test
right now, but with more obvious semantics. The syntax would however be significantly more verbose, we need to work on that as well.airflow dags test
since the its usage can be covered by the above two additions.Related issues
Issue raised against 2.2 beta about the changed behaviour: #18473 PR to "restore" the pre-2.2 behaviour: #18742
Are you willing to submit a PR?
Code of Conduct