DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
901 stars 240 forks source link

workDir and jobStore should default to (shared) tmp-outdir-prefix #5143

Open gmloose opened 4 weeks ago

gmloose commented 4 weeks ago

The documentation for the options --workDir and --jobStore state that, when a workflow is run on a distributed batch system, these locations must be accessible by all worker nodes. Currently, unless explicitly set by the user, both workDir and jobStore are set to whatever value is given to tmpdir-prefix (or a system-default tmpdir location).

It would make sense to set workDir and jobStore to whatever value is given to tmp-outdir-prefix, which should be set to a shared location when using a distributed batch system.

I therefore propose the following changes:

Note that tmpdir-prefix need not exist on the node where Toil is running, as long as it exists on all the worker nodes. Hence, Toil should not check for its existence on the "head" node.

I can create a pull request. However, I'm unsure as to how this should be tested, because I couldn't find any Toil tests that check the correct (documented) behaviour of these command-line options. Are there?

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1664

gmloose commented 1 week ago

Provided by https://github.com/DataBiosphere/toil/pull/5154