Closed dhatch-niv closed 2 years ago
Thanks for opening your first issue here! Be sure to follow the issue template!
Reproduced with:
def test_dataset_cascade_deletion(self,dag_maker, session):
from airflow.decorators import task
with dag_maker(schedule=None, serialized=True) as dag1:
@task(outlets=Dataset('test/1'))
def test_task1():
print(1)
test_task1()
dr1 = dag_maker.create_dagrun()
test_task1 = dag1.get_task('test_task1')
with dag_maker(dag_id='testdag', schedule=[Dataset('test/1')], serialized=True) as dag2:
@task
def test_task2():
print(1)
test_task2()
ti = dr1.get_task_instance(task_id='test_task1')
ti.run()
# Change the dataset.
with dag_maker(dag_id='testdag', schedule=[Dataset('test2/1')], serialized=True) as dag2:
@task
def test_task2():
print(1)
test_task2()
on it.
Thanks for the quick fix on this @uranusjr 🔥
Apache Airflow version
2.4.2
What happened
I have a DAG that is triggered by three datasets. When I remove one or more of these datasets, the web server fails to update the DAG, and
airflow dags reserialize
fails with anAssertionError
within SQLAlchemy. Full stack trace below:What you think should happen instead
The DAG does not properly load in the UI, and no error is displayed. Instead, the old datasets that have been removed should be removed as dependencies and the DAG should be updated with the new dataset dependencies.
How to reproduce
Initial DAG:
At least one of the datasets should be 'ready'. Now
dataset_dag_run_queue
will look something like below:Then, update the DAG with new datasets:
Now you will observe the error in the web server logs or when running
airflow dags reserialize
.I suspect this issue is related to handling of cascading deletes on the
dataset_id
foreign key for the run queue table. Datasetid = 16
is one of the datasets that has been renamed.Operating System
docker image - apache/airflow:2.4.2-python3.9
Versions of Apache Airflow Providers
Deployment
Docker-Compose
Deployment details
Running using docker-compose locally.
Anything else
To trigger this problem the dataset to be removed must be in the "ready" state so that there is an entry in
dataset_dag_run_queue
.Are you willing to submit PR?
Code of Conduct