DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
894 stars 241 forks source link

Make WDL notice when a file can no longer be used and delete it #4872

Open unito-bot opened 5 months ago

unito-bot commented 5 months ago

It would be good if files in WDL workflows were deleted from the job store when they can no longer be accessed by any remaining jobs.

We could add nodes to the workflow graph to delete File values, and make them depend on the completion of all the WDL nodes that actually use those File values, and when they run they delete the files.

I think MiniWDL’s analysis tracks what we would need for this, but we might also want to think about passing sub-environments around the top-level workflow graph that only have the values that actually are referenced, so we don’t feed all the File variables to a bunch of nodes that don’t reference them.

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1540

adamnovak commented 3 months ago

People are complaining that they filled up our cluster storage with job stores full of big intermediate files, so we should probably actually do this.

unito-bot commented 2 weeks ago

➤ Adam Novak commented:

Julian said this could be useful for the assembly workflows.