PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.63k stars 1.65k forks source link

Delete flow run logs when flow run is deleted #10000

Open OptimeeringBigya opened 1 year ago

OptimeeringBigya commented 1 year ago

First check

Bug summary

I have run some tests on deletion of flows and how they impact the database in terms of space.

The flow [provided in Reproduction section] was run on a fresh server.

The following queries were run on the database

  1. after the completion of the flow
  2. after deletion of main flow
  3. after deletion of sub flow

I have posted the results of the query below,

-- query -- count after run success, count after deletion of main flow, count after deletion of sub flow
select * from artifact a --5,5,5
select * from log --13,13,13
select * from flow_run fr --2,1,0
select * from flow_run_state frs order by updated--6,3,0
select * from task_run tr --3,1,0
select * from task_run_state trs order by updated --10,3,0

Findings / bugs(?)

  1. Deletion of main flow does not delete the sub flow associated with the main flow.
  2. The artifact and the log tables are never cleaned.
  3. Not included in the test, but task_run_state_cache table is also never cleaned - even after the cache_expiration date has passed.
  4. Attempting to clear the database after deleting the main flow results in an error.
    
    prefect server database reset
    Are you sure you want to reset the Prefect database located at "postgresql+asyncpg://p_user:***@127.0.01:5432/p_db"? This will drop and recreate all tables. [y/N]: y
    Downgrading database...

.....

DETAIL: Key (flow_run_id)=(91ef0688-fadf-4bbf-b14a-5b3a01c5a784) is not present in table "flow_run". [SQL: ALTER TABLE artifact ADD CONSTRAINT fk_artifactflow_run_idflow_run FOREIGN KEY(flow_run_id) REFERENCES flow_run (id)] (Background on this error at: https://sqlalche.me/e/20/gkpj) An exception occurred.


### Reproduction

```python3
import prefect
from prefect import flow,task

@task
def t1():
    prefect.get_run_logger().info("This is a task log")

@flow(name="main")
def b():
    prefect.get_run_logger().info("Generated logs main")
    t1()
    f()

@flow(name="sub")
def f():
    prefect.get_run_logger().info("Generated logs sub")
    t1()

if __name__ == "__main__":
    b()

Error

Described in Bug Summary.

Versions

Version:             2.10.12
API version:         0.8.4
Python version:      3.10.6
Git commit:          ec55abdf
Built:               Thu, Jun 1, 2023 3:51 PM
OS/Arch:             linux/x86_64
Profile:             local_db
Server type:         server

Additional context

These may or may not be issues. I would love to know if these were intentional and if there are any planned roadmap for these / suggestions on clean up of these records.

serinamarie commented 1 year ago

Deletion of main flow does not delete the sub flow associated with the main flow

Correct, it appears that after deleting the flow run, we do not search through all other flow runs to see if they match the "parent" flow run. https://github.com/PrefectHQ/prefect/blob/44c5e6af78fcc5da1947774efb62d9beae61e345/src/prefect/server/models/flow_runs.py#L410-L427

Some of the deletes you mention would be manually performed in the UI, CLI, API, etc. when not using Prefect Cloud where a deletion service would perform these batch deletes for you.

Attempting to clear the database after deleting the main flow results in an error.

That shouldn't occur, and we think it's due to the fact that when you reset the database, we run a downgrade that adds a FK back to the artifact table that causes that issue. We believe that removing that would be an option that would resolve that issue, but are thinking more on it:

https://github.com/PrefectHQ/prefect/blob/44c5e6af78fcc5da1947774efb62d9beae61e345/src/prefect/server/database/migrations/versions/sqlite/2023_02_08_152028_8d148e44e669_remove_artifact_fk.py#L36-L38

OptimeeringBigya commented 1 year ago

HI @serinamarie , Thank you for the response.

Any comments on the logs not being cleared when flow runs are deleted? I can understand why artifacts should be persisted even after the flow run has been deleted; but I am not sure if the same can be said about the logs. Additionally, I do not think there are any means of deleting logs from UI or using the API.

OptimeeringBigya commented 1 year ago

Correct, it appears that after deleting the flow run, we do not search through all other flow runs to see if they match the "parent" flow run.

I've also noticed that parent_task_run_id in flow_run table is changed from task in parent flow to null when the parent flow is deleted.

WillRaphaelson commented 1 year ago

Hi @OptimeeringBigya and thanks for the issue. At this time we will accept the scope of deleting logs when flow runs are deleted.

pranaymodukuru commented 2 months ago

Any update on this?