PMCC-BioinformaticsCore / janis

[Alpha] Janis: an open source tool to machine generate type-safe CWL and WDL workflows
https://janis.readthedocs.io/
GNU General Public License v3.0
41 stars 13 forks source link

Resource requirements #11

Closed illusional closed 4 years ago

illusional commented 4 years ago

There are two additional runtime characteristics that a Janis tool needs to capture

The time limit should be easy, we'll add an arbitrary runtime_minutes field into WDL runtime section and wire it up from our janis-assistant Cromwell configurations. In CWLTool this is handled by v1.1's ToolTimeLimit.

This change we'll look at deprecating cwlgen in favour of the schema salad cwl-utils.

This Disk requirements is likely to require #8 (expressions), as they're usually derived from some multiplier of an input file size.

Backwards compatibility

This will add two methods to override on a CommandTool (or two extra inputs on a CommandToolBuilder), so we're anticipating that previous workflows will be compatible with no code changes.

drtconway commented 4 years ago

It would be awesome to have a job limit for batch scheduling.

The use case is this:

I have a limited number of jobs runnable on the HPC, and I have multiple tasks - some requiring long-running batch jobs (e.g. doing lots of whole genome alignment and variant calling), but I also have over shorter workflows to run. I'd like to limit the number of jobs scheduled by the long-running batch job so that I get reasonable turn-around time for the short-running workflows.

Also, many batch systems are configured in some kind of first-come-first-served manner, so limiting the number of jobs allows users to "play nicely" on such systems.

For bonus marks, if the jobs are requesting resources (e.g. RAM, cores, etc) that are subject to quotas on the HPC backend, it would be good to allow them to be limited for the same reason.

illusional commented 4 years ago

This has been implemented and are available in the primary Janis branch for requesting and overriding:

These request variable times from the batch system for Slurm and PBS. The static values can be clamped using the environment values in the JanisConfiguration. Unfortunately it's currently not possible to automatically clamp these values if they're derived at runtime. This might be something we can add in the future if there's interest.