DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
900 stars 240 forks source link

CWL Conditional steps that aren't running still get scheduled to a job scheduler. #3990

Open kannon92 opened 2 years ago

kannon92 commented 2 years ago

When I use conditionals, I find that the job still gets scheduled to a batch system.

conditional-workflow.cwl

class: Workflow
cwlVersion: v1.2
inputs:
  message: string
  sleepParam: int 
outputs:
  out1:
    type: File
    outputSource:
      - echo/echoOut
    pickValue: first_non_null

requirements:
  InlineJavascriptRequirement: {}

steps:
  echo:
    in: 
      message: message
      sleepParam: sleepParam
    run: echo.cwl
    out: [echoOut]
    when: $(inputs.sleepParam > 1)

echo.cwl

cwlVersion: v1.2
class: CommandLineTool
baseCommand: [echo]
id: "echo"
inputs: 
  message: 
    type: string
    inputBinding:
      position: 1
outputs: 
  echoOut:
    type: stdout

param.yaml

sleepParam: 1
message: hello

When I run this workflow, you can see in the log files that the job gets issued. For HPC systems, it does look like the job gets scheduled on a HPC cluster.

[2022-01-07T09:39:12-0500] [MainThread] [I] [toil.leader] Issued job 'CWLJob' echo-sleep-conditional.cwl.echo.echo kind-CWLJob/instance-e9x_ed10 v1 with job batch system ID: 1 and cores: 1, disk: 3.0 Gi, and memory: 2.0 Gi

┆Issue is synchronized with this Jira Task ┆friendlyId: TOIL-1121

kannon92 commented 2 years ago

Taking a closer look at this issue.

It looks like the code in the step does not run if the conditional is false. cwltoil does run the conditional before the actual step is run so the jobs that are forbidden from running do terminate much quicker.

A quick test I have is a sleep command I sleep for a long time if conditional is true. If the conditional is false, the step is very fast.

It looks like the main thing is that the job is still getting scheduled to the leader and that is why it schedules a dummy job to a batch system.

This seems to be a minor issue but I'm not sure how to tell the leader to not spawn a batch system if the conditional is false. It seems that the Job classes in cwltoil all use the run function and that is where we evaluate the conditional. But I guess that the job still gets added to the queue.

adamnovak commented 4 months ago

We could add additional local jobs to check the conditions, and only if they are true schedule a job out to the cluster.

unito-bot commented 4 months ago

➤ Adam Novak commented:

One thing we would want to know here is if we’re scheduling a job with its full resource requirements, just to check the condition and note that it isn’t satisfied. If the job needs to reserve a lot of resources it could sit in queue for a long time, and it will waste those resources while running if the condition fails.

unito-bot commented 2 weeks ago

➤ Adam Novak commented:

We have good local job flagging, we could make sure the condition evaluation is its own job and flag it local like we do for WDL.