Open sai3563 opened 10 months ago
@bbovenzi can I work on this?
Often, we add and remove such configs. On removal, their dags disappear but datasets remain. Deletion option of dataset will help maintain cleanliness in the Datasets page for me and I'm sure many others in the future.
@sai3563 I would like to clarify, did you mean the datasets remain on both UI and database or database only? Because from what I can see orphaned dataset are not shown in the UI.
Unless force deletion of unorphaned dataset is possible and meaningful (it will recreated in next DAG code scan anyway cmiiw) I'm not sure if we need to add delete button in UI. I do agree with delete function using API though and I'm working on it.
@im-perativa I just checked again and found that orphaned datasets continue to show up in the UI.
I'm on Airflow 2.7.3. Not sure if anything has changed after that in regards to datasets, but here is how this works for me.
I use configs in MongoDB to make dynamic dags. Let's say 200 documents in MongoDB = 200 dags in Airflow. Based on the name of the dag, datasets are also created.
Now I add 1 document, so now I have 201 dags and I run the newly added dag. Dataset of the new dag gets updated. Now if I remove that config from Mongo, dag in airflow of that document also gets deleted. But dataset of it remains.
Description
Hi All,
I am extensively using Data Aware Scheduling in my projects. One thing I've noticed is that in the UI or via code, there is no button/function to delete datasets. It would be great if we can add a function to do the same and also a button in UI to delete it.
Use case/motivation
In my case, I am generating dags dynamically which also creates datasets. The configs, stored in MongoDB, creates the dags. Often, we add and remove such configs. On removal, their dags disappear but datasets remain. Deletion option of dataset will help maintain cleanliness in the Datasets page for me and I'm sure many others in the future.
Are you willing to submit a PR?
Code of Conduct