DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
891 stars 241 forks source link

Set TOIL_SLURM_ARGS per step for a CWL workflow #3231

Open alexiswl opened 3 years ago

alexiswl commented 3 years ago

Hello, I was wondering if there is any way to configure (likely at runtime or otherwise) slurm configuration parameters on a per-tool basis.

i.e I have a three step workflow where I would like to submit jobs to different queues/partitions.

I was thinking something in the cwl:overrides on the inputs.yml would be the best place this.

from this CWL issue

cwl:override:
  workflow.cwl#step2:
    TOIL_SLURM_ARGS:
      partition: highIO
  workflow.cwl#step2:
    TOIL_SLURM_ARGS:
      partition: largeMem

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-664

alexiswl commented 3 years ago

Or do I use getWorkerContexts to specify the partitions of particular steps in the workflow?

mr-c commented 3 years ago

Hello @alexiswl. Can you tell us more about why you want to submit to different queues? Is it due to matching specific computing resources?

alexiswl commented 3 years ago

Hi Michael,

I'm interested in developing a workflow that can use alternative hardware, in this case FPGA, for some of the steps in the workflow.

It would mean that certain steps would need to be submitted to a different slurm queue to ensure that the node that they are launched on has the right hardware requirements.

Alexis

mr-c commented 3 years ago

Cool! Then the best way forward would be a custom CWL Requirement about the hardware needs and adding logic to toil-cwl-runner to support a local configuration mapping that custom requirement to a local queue name. Here's a similar feature that was implemented for MPI support (but is not yet supported in toil-cwl-runner): https://github.com/common-workflow-language/cwltool#running-mpi-based-tools-that-need-to-be-launched with code at https://github.com/common-workflow-language/cwltool/pull/1276

An alternative approach would be https://github.com/common-workflow-language/common-workflow-language/issues/581

Regardless I would appreciate a solution that was not specific to Slurm 🙂

adamnovak commented 1 year ago

This sounds like it would be really useful, but would also be somewhat hard to implement in Toil.

Right now the Toil batch systems can take batch-system-specific configuration through command line arguments and environment variables, but we don't have any machinery fro Toil jobs to carry batch-system-specific information or free-form annotations on them.

So we'd need to develop a system where a job could have some kind of special batch-system-specific tags, and then we'd need to create a tag for Slurm for extra submission arguments just for that job, and then we'd need to build the CWL frontend to let you set that annotation just for the job.

getWorkerContexts is unrelated; it is about Python context managers that do setup inside the worker process, before running the user code.

For getting support for FPGAs, Toil does now have a system for "Accelerators". In addition to cores, memory, and disk, a Toil job can now request accelerators in a variety of formats. In the Toil Python API, this would look something like accelerators=[{'count': 1, 'kind': 'gpu', 'api': 'cuda'}].

In CWL, we've used this to implement CUDARequirement, but CWL doesn't have a standard (or even de facto standard AFAIK) way to ask for accelerators other than GPUs.

We haven't yet implemented Slurm support for GPUs (#4308), but Slurm seems to have a standard way of asking for GPUs. If we wanted to implement support for FPGA accelerators on Slurm using the Toil accelerator system, we'd need a way to explain to Toil how exactly the accelerators are to be requested (i.e. the different queue names to use in this case).

Did CWL PartitionRequirement ever get any implementations? The idea of a partition is common to a few HPC batch systems, so it would make more sense to implement in Toil than a system for random per-job scheduler hints, and probably be almost as versatile.

alexiswl commented 1 year ago

Thanks for following this up @adamnovak, our access to FPGAs is specifically in cloud services. In this case, I was looking into AWS parallel cluster (which I'm not sure yet supports FPGA architecture for compute nodes anyway).