Open tatiana opened 22 hours ago
Note that this only happens if the DAG is not already parsed by the DAG processor. Once parsed, the DatasetAlias will have an entry in the database, and the DAG will run as expected.
Running this with a plain Dataset under the same condition also produces an unexpected result:
with DAG("dataset_dag", ...):
BashOperator(bash_command=":", outlets=Dataset("some_dataset"))
The task will succeed, but no event is emitted.
$ airflow dags test dataset_dag
...
[2024-09-26T09:50:38.635+0000] {manager.py:96} WARNING - DatasetModel Dataset(uri='some_dataset', extra=None) not found
...
Ultimately this is due to airflow dags test
is essentially out-of-band execution that does not involve in the normal DAG parsing and scheduling. Dataset event emission is entirely implemented around the database, and thus does not work well in this mode.
I think we should make a decision how the design of this entire function going forward. Since the situation is sort of similar to airflow dags backfill
, maybe we should make this also go through the scheduler instead? (Basically make the CLI behave like triggering a manual run in web UI.) This is a relatively large undertaking though. What should we do for 2.x?
Apache Airflow version
2.10.1
If "Other Airflow 2 version" selected, which one?
No response
What happened?
Given the DAG:
When I try to run:
I get the error:
This DAG successfully executes when not being triggered via the
dags test
command.What you think should happen instead?
I should be able to run
dags test
for this DAG without seeing this error message.How to reproduce
Already described.
Operating System
Any
Versions of Apache Airflow Providers
No response
Deployment
Other
Deployment details
It is not happening during deployment (tested in Astronomer, and it worked fine). The issue happens when running the
airflow dags test
command locallyAnything else?
No response
Are you willing to submit PR?
Code of Conduct