PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.35k stars 1.5k forks source link

Purge old Dask futures when no longer needed #13536

Open RobJBarr opened 1 month ago

RobJBarr commented 1 month ago

First check

Prefect Version

2.x

Describe the current behavior

In the current implementation of the DaskTaskRunner, the Dask futures for each task are maintained in a dictionary, even when no longer needed in downstream tasks. This results in a reference to the future being maintained on the client side so it is never cleaned up from the workers

Describe the proposed behavior

An option to purge Dask futures which are no longer needed is provided in the DaskTaskRunner. Perhaps this could take the form of a callback once a prefect future is completed to check if all its upstream tasks have any further downstream dependents, and if not then delete the futures so they can be garbage collected

Example Use

No response

Additional context

No response

desertaxle commented 1 month ago

Thanks for the issue @RobJBarr! We have a new implementation of the DaskTaskRunner that will released alongside our upcoming 3.0 release that no longer keeps references to Dask futures. I think that will resolve this issue that you're seeing. If you want to give it a try, you can install prefect==3.0.0rc1 and prefect-dask==0.3.0rc1.