Open OptimeeringBigya opened 1 year ago
Deletion of main flow does not delete the sub flow associated with the main flow
Correct, it appears that after deleting the flow run, we do not search through all other flow runs to see if they match the "parent" flow run. https://github.com/PrefectHQ/prefect/blob/44c5e6af78fcc5da1947774efb62d9beae61e345/src/prefect/server/models/flow_runs.py#L410-L427
Some of the deletes you mention would be manually performed in the UI, CLI, API, etc. when not using Prefect Cloud where a deletion service would perform these batch deletes for you.
Attempting to clear the database after deleting the main flow results in an error.
That shouldn't occur, and we think it's due to the fact that when you reset the database, we run a downgrade that adds a FK back to the artifact table that causes that issue. We believe that removing that would be an option that would resolve that issue, but are thinking more on it:
HI @serinamarie , Thank you for the response.
Any comments on the logs not being cleared when flow runs are deleted? I can understand why artifacts should be persisted even after the flow run has been deleted; but I am not sure if the same can be said about the logs. Additionally, I do not think there are any means of deleting logs from UI or using the API.
Correct, it appears that after deleting the flow run, we do not search through all other flow runs to see if they match the "parent" flow run.
I've also noticed that parent_task_run_id
in flow_run
table is changed from task in parent flow to null
when the parent flow is deleted.
Hi @OptimeeringBigya and thanks for the issue. At this time we will accept the scope of deleting logs when flow runs are deleted.
Any update on this?
First check
Bug summary
I have run some tests on deletion of flows and how they impact the database in terms of space.
The flow [provided in Reproduction section] was run on a fresh server.
The following queries were run on the database
I have posted the results of the query below,
Findings / bugs(?)
artifact
and thelog
tables are never cleaned.task_run_state_cache
table is also never cleaned - even after thecache_expiration
date has passed......
DETAIL: Key (flow_run_id)=(91ef0688-fadf-4bbf-b14a-5b3a01c5a784) is not present in table "flow_run". [SQL: ALTER TABLE artifact ADD CONSTRAINT fk_artifactflow_run_idflow_run FOREIGN KEY(flow_run_id) REFERENCES flow_run (id)] (Background on this error at: https://sqlalche.me/e/20/gkpj) An exception occurred.
Error
Versions
Additional context
These may or may not be issues. I would love to know if these were intentional and if there are any planned roadmap for these / suggestions on clean up of these records.