DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
893 stars 241 forks source link

Missing inputs of non-executed conditional CWL workflow steps cause Toil to fail #4930

Open lonbar opened 4 months ago

lonbar commented 4 months ago

Consider the following workflow:

cwlVersion: v1.2
class: Workflow
inputs:
    input: int?
outputs: []
steps:
    step:
        in:
          input:
            source: input
            valueFrom: $(self)
        out: []
        label: sometimes_run
        when: $(inputs.input != null)
        run:
          cwlVersion: v1.2
          class: CommandLineTool
          baseCommand: date
          inputs:
            input: int
          outputs: []
          requirements:
            ResourceRequirement:
                coresMax: $(inputs.input)
requirements:
    StepInputExpressionRequirement: {}
    InlineJavascriptRequirement: {}

This workflow passes the validation of cwltool, so to my understanding it is a valid workflow.

Expected behaviour

This workflow should execute step if input is supplied, and should skip step if input is not supplied.

Actual behaviour

When this workflow is executed with cwltool it behaves as expected. When executed with toil-cwl-runner it fails if input is not supplied:

[2024-05-15T11:32:24+0100] [Thread-1 (daddy)] [E] [toil.batchSystems.singleMachine] Got exit code 1 (indicating failure) from job _toil_worker CWLJobWrapper /tmp/tmpay1diyp0 kind-CWLJ
obWrapper/instance-39ncl25m.
[2024-05-15T11:32:24+0100] [MainThread] [W] [toil.leader] Job failed with exit value 1: 'CWLJobWrapper' test.cwl.step._:77010ff2-fb21-44b1-bf26-f34939b9c3da._wrapper kind-CWLJobWrappe
r/instance-39ncl25m v1
Exit reason: None
[2024-05-15T11:32:24+0100] [MainThread] [W] [toil.leader] The job seems to have left a log file, indicating failure: 'CWLJobWrapper' test.cwl.step._:77010ff2-fb21-44b1-bf26-f34939b9c3
da._wrapper kind-CWLJobWrapper/instance-39ncl25m v2
[2024-05-15T11:32:24+0100] [MainThread] [W] [toil.leader] Log from job "kind-CWLJobWrapper/instance-39ncl25m" follows:
=========>
    [2024-05-15T11:32:23+0100] [MainThread] [I] [toil.worker] ---TOIL WORKER OUTPUT LOG---
    [2024-05-15T11:32:23+0100] [MainThread] [I] [toil] Running Toil version 6.0.0-0e2a07a20818e593bfdfde3cc51ca4ad809fde96 on host PHY-TJLV53.
    [2024-05-15T11:32:23+0100] [MainThread] [I] [toil.worker] Working on job 'CWLJobWrapper' test.cwl.step._:77010ff2-fb21-44b1-bf26-f34939b9c3da._wrapper kind-CWLJobWrapper/insta
nce-39ncl25m v1
    [2024-05-15T11:32:24+0100] [MainThread] [I] [toil.worker] Loaded body Job('CWLJobWrapper' test.cwl.step._:77010ff2-fb21-44b1-bf26-f34939b9c3da._wrapper kind-CWLJobWrapper/inst
ance-39ncl25m v1) from description 'CWLJobWrapper' test.cwl.step._:77010ff2-fb21-44b1-bf26-f34939b9c3da._wrapper kind-CWLJobWrapper/instance-39ncl25m v1
    Traceback (most recent call last):
      File "/home/me/.local/lib/python3.10/site-packages/cwltool/process.py", line 434, in fill_in_defaults
        raise WorkflowException(
    cwltool.errors.WorkflowException: Missing required input parameter 'input'

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "/home/me/.local/lib/python3.10/site-packages/toil/worker.py", line 407, in workerScript
        job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
      File "/home/me/.local/lib/python3.10/site-packages/toil/job.py", line 2829, in _runner
        returnValues = self._run(jobGraph=None, fileStore=fileStore)
      File "/home/me/.local/lib/python3.10/site-packages/toil/job.py", line 2746, in _run
        return self.run(fileStore)
      File "/home/me/.local/lib/python3.10/site-packages/toil/cwl/cwltoil.py", line 2333, in run
        fill_in_defaults(
      File "/home/me/.local/lib/python3.10/site-packages/cwltool/process.py", line 425, in fill_in_defaults
        with SourceLine(inputs, e, WorkflowException, debug):
      File "/home/me/.local/lib/python3.10/site-packages/schema_salad/sourceline.py", line 249, in __exit__
        raise self.makeError(str(exc_value)) from exc_value
    cwltool.errors.WorkflowException: test.cwl:34:13: Missing required input parameter 'input'
    [2024-05-15T11:32:24+0100] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host PHY-TJLV53
<=========

Further thoughts

The main issue appears to be that toil evaluates conditional step inputs if one of these inputs is used in ResourceRequirement. Other requirements do not seem to have this problem. It is not clear to me why toil does this if the condition to run this step is False. I noticed that makeJob in lines 2787-2809 of cwltoil.py treats this requirement differently, but so far I am having some trouble further pinpointing the cause of the issue.

Currently this problem prevents people from running some of our workflows with toil. Any additional insights are very much appreciated.

versions

$ cwltool --version && toil-cwl-runner --version
/home/me/.local/bin/cwltool 3.1.20240112164112
6.0.0

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1567

adamnovak commented 2 months ago

This is probably related to how we issue the job to check its own condition. If we move away from that and towards issuing separate condition-checking jobs this problem might be easy to fix.