ReproNim / reproman

ReproMan (AKA NICEMAN, AKA ReproNim TRD3)
https://reproman.readthedocs.io
Other
24 stars 14 forks source link

How to "provision" temporary/scratch work directories? #498

Open yarikoptic opened 4 years ago

yarikoptic commented 4 years ago

This is "inspired" by the problem originally reported in https://github.com/ReproNim/reproman/pull/438#issuecomment-522047558 with a proposed fix (closed without merge) in https://github.com/ReproNim/reproman/pull/451 to just let tar ignore disappearing files.

Although we still do not know underlying trigger (lingering cleanup process or alike), this specific behavior reminded that in many cases we would like to provide a path to some location (on remote resource) which pipelines could use as a scratch space. In https://github.com/ReproNim/reproman/pull/438/files#diff-5b4aa18b79cf44a38ba925fff658fd8cR129 I just added that work/ directory to .gitignore. And that probably (will try next) should theoretically be sufficient if I use datalad-pair orchestrator which should datalad save remotely and use datalad update to fetch results. In case of datalad-pair-run, the content is first tar'ed on remote side (hence that original "inspirational" issue of files disappearing in a work/ directory) including the not-so-needed work dir, which might be huge, so we should allow for that to be avoided.

The easiest way is to specify some work directory outside of the dataset which gets datalad saved/transferred. But ideally

so I guess we should

Also relates to https://github.com/ReproNim/reproman/issues/467 ("cleanup") on what to do with such directories upon success/failure.