apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.37k stars 14.1k forks source link

Cannot delete DAG with many runs #41148

Open GergelyKalmar opened 1 month ago

GergelyKalmar commented 1 month ago

Apache Airflow version

Other Airflow 2 version (please specify below)

If "Other Airflow 2 version" selected, which one?

2.8.1 (AWS MWAA)

What happened?

I tried to delete a DAG with many runs (14000+). After about 4 minutes I receive "xxx.yyy.airflow.amazonaws.com didn’t send any data. ERR_EMPTY_RESPONSE", and the DAG is not deleted.

What you think should happen instead?

The DAG should have been deleted.

How to reproduce

Create a DAG with 14000 runs and try to delete it.

Operating System

AWS MWAA

Versions of Apache Airflow Providers

No response

Deployment

Amazon (AWS) MWAA

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

eladkal commented 1 month ago

yes this is a bug but just to clarify the delete partially works. It deletes some of the records before raising the error, so every time you click on the button it will reduce the amount of records. I assume 2-3 times will delete all the records

and the DAG is not deleted.

DAG can not be deleted from the UI. the delete only removes the records associated with it (DagRun, TaskInstance...) if you want to remove dag completely then you need also to remove the .py file of the DAG and then delete the records (with the UI or API if the dag is not present)

eladkal commented 1 month ago

tagged with DB label as this is probably related to inefficient query to the DB.

GergelyKalmar commented 1 month ago

yes this is a bug but just to clarify the delete partially works. It deletes some of the records before raising the error, so every time you click on the button it will reduce the amount of records. I assume 2-3 times will delete all the records

Are you sure? The DAG run instance numbers seems to be the same after the crash, could it be that the deletion is rolled back and so it does not execute partially? I've tried to re-run the deletion many times, and I saw no change.

DAG can not be deleted from the UI. the delete only removes the records associated with it (DagRun, TaskInstance...) if you want to remove dag completely then you need also to remove the .py file of the DAG and then delete the records (with the UI or API if the dag is not present)

Yes, I'm aware of this, I meant the removal of the related records.

omkar-foss commented 1 month ago

I tried to delete a DAG with many runs (14000+). After about 4 minutes I receive "xxx.yyy.airflow.amazonaws.com didn’t send any data. ERR_EMPTY_RESPONSE", and the DAG is not deleted.

@GergelyKalmar the ERR_EMPTY_RESPONSE suggests that the connection is closed before the DAG deletion completed, particularly because the 14k+ runs deletion operation in this function is taking too long. This is a bug as @eladkal pointed out, and when this bug gets fixed here in upstream Airflow, it may be a while before it reflects in downstream MWAA which you're using.

Meanwhile, I'd suggest you try to delete the DAG via the Airflow REST API with an increased request timeout. I've written a small script for this (Gist here) by referring to the MWAA docs, please check it out. You can tweak the script as per your need and increase the timeout beyond 10 mins if the deletion takes longer and fails. Hope this helps.