DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
900 stars 240 forks source link

WDL Call Caching #4797

Closed stxue1 closed 2 weeks ago

stxue1 commented 9 months ago

toil-wdl-runner doesn't currently supporting call caching similar to MiniWDL. The restart system doesn't seem to be applicable as it stores data to input into future jobs rather than storing data from finished jobs.

Maybe there is a way to piggyback off of the MiniWDL cache implementation, but Toil will probably need some way to store results of jobs.

Adding call caching to Toil itself will probably be difficult as it passes around more types of data compared to toil-wdl-runner. So only adding it to toil-wdl-runner itself may be the right move.

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1500

unito-bot commented 3 months ago

➤ Adam Novak commented:

I actually want this for troubleshooting DeepVariant calling with vg.

unito-bot commented 2 months ago

➤ Adam Novak commented:

I think if we keep track of a shared-filesystem path for files and implement hashing (and file hashing/lookup) the way MiniWDL does, we can share a cache with MiniWDL. We just need to find a place to copy stuff out of the job store to save it with the cache entries.