DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
879 stars 237 forks source link

Add an argument to specify number of GPUs to use for a `toil-wdl-runner` task #4945

Open stxue1 opened 1 month ago

stxue1 commented 1 month ago

WDL 1.1 says that tasks can be specified to need GPUs:

task gpu_test {
  #.....
  runtime {
    gpu: true
  }
}

The field is a boolean value, and we're supposed to provide an argument to specify the number of GPUs needed:

This attribute cannot request any specific quantity or types of GPUs to make available to the task. Any such information should be provided using an execution engine-specific attribute.

The closest that I think we have is --defaultAccelerators but it ignores the wanted batch system. For example, with --batchSystem=slurm --defaultAccelerators=1:

[2024-05-22T18:39:35-0700] [MainThread] [C] [toil.wdl.wdltoil] Could not run workflow because:

🚨🚨🚨
The job 'WDLRootJob' kind-WDLRootJob/instance-2a0q47k9 v1 is requesting [{'count': 1, 'kind': 'gpu'}] accelerators, more than the maximum of [] accelerators that SingleMachineBatchSystem was configured with. The accelerator {'count': 1, 'kind': 'gpu'} could not be provided. Scale is set to 1.
🚨🚨🚨

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1576

stxue1 commented 1 month ago

The spec doesn't necessarily say we need an argument, just that the attributes are engine defined. This could be equivalent to the gpuCount field (which we seem to support?)

This feature is probably more of a nice-to-have.