DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
900 stars 240 forks source link

Access the config from within a job #3619

Open multimeric opened 3 years ago

multimeric commented 3 years ago

I'm trying to fix toil-container, with reference to #1768. One key aspect to this is being able to set --singularity or --docker on the command line, storing that in the Toil options, and then checking this value later on when we go to run a container. How can a running job access the toil options?

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-903

mr-c commented 3 years ago

This is for non-toil-cwl-runner scripts, so please make sure that the new --docker option does not flow to toil-cwl-runner and the new --singularity option does not override toil-cwl-runner's --singularity option. Thanks!

multimeric commented 3 years ago

Yes more likely it would be --container-engine docker or similar

mr-c commented 3 years ago

Yes more likely it would be --container-engine docker or similar

The name(s) can overlap, but we need to make sure it doesn't show up as a toil-cwl-runner option, as that has its own methods..

adamnovak commented 2 years ago

This would also be relevant to #4142 if we want to make the way Toil passes around its config information a little more extensible.

The Toil architecture is to take the options object from ArgParse and copy a bunch of information into a Config instance. So we don't actually have the original ArgParse Namespace available in the jobs to be gotten. Usually when I've written Python pipelines I've ended up just passing it along to all my jobs as an argument.

Since the JobDescription refactor, we have jobs keeping references to the Toil Config in their JobDescriptions, which we use for filling in default resource requirements from the config when they are not set at the job level. When the job is deserialized, it is hooked up to the config by calling assignConfig() on it.

So if you want a custom job class to get ahold of the config, you could override assignConfig() and stash it somewhere where it won't get pickled again, or you can look at self.description._config.

A real solution to this would probably involve:

  1. A way to get the config from a getter method, without digging into the internals of the JobDescription which might change.
  2. A good way to actually send user data along with the Toil config, maybe letting the user hook their options into a more-unified Toil config-file/option/environment-var/config-object system, or maybe just giving the user a free-form Namespace they can stick stuff in that the Toil Config will carry along.
adamnovak commented 2 years ago

If you want to get at the object on the leader, it would be in the config field on the Toil context manager, when you are inside it.

adamnovak commented 1 year ago

We should come up with a good way to make the config system as updated in #4569 officially user-extensible, and document it with an example in the docs.

The workaround is to just cram more fields into it on the leader, and reach into the internals of Toil to get it from the current job on the worker.