PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.56k stars 1.65k forks source link

Auto clean-up feature for the Prefect internal database #16054

Open rmnvncnt opened 5 days ago

rmnvncnt commented 5 days ago

I figured out that our Prefect server deployment was running slow over time and we had trouble scheduling new jobs or updating data in the UI. The issue was the Prefect internal database that was overflowing with logs from old runs and using a script suggested by @Arthurhussey helped mitigate the problem by removing logs older than a week.

While this solution worked in my case, having a scheduled flow tampering with the Prefect database directly might be a source of issues downhill.

It would be very nice if Prefect server had a way of cleaning its logs automatically. For instance, an environment variable similar to PREFECT_EVENTS_RETENTION_PERIOD for flow runs and task runs.

The initial discussion :

@rmnvncnt the Prefect server doesn't have any auto clean-up features right now, but if that's something you'd like, please open an issue so we can discuss it further!

It looks like the issue of deployments not being displayed has been solved by reducing the amount of data in your DB so the scheduler can insert scheduled runs, so I'm going close this issue.

Originally posted by @desertaxle in https://github.com/PrefectHQ/prefect/issues/15919#issuecomment-2483931935

mikelogaciuk commented 5 days ago

That is good idea.

In my company, we delete everything from:

That is older than 60 days (WHERE created < (CURRENT_DATE -60);).

And we do of course a periodic VACUUM on those tables in order to get the storage back.