common-workflow-language / user_guide

The CWL v1.0 - v1.2 user guide
http://www.commonwl.org/user_guide/
Other
40 stars 67 forks source link

Handling of exclusive/dependent inputs #469

Open bartns opened 2 months ago

bartns commented 2 months ago

More guidance on how to use dependent and mutually exclusive inputs in a multi-stepped workflow would be nice. I could only find something in the FAQ: https://www.commonwl.org/user_guide/faq.html#contents

Some examples for dependent workflows. Credit goes to @alexiswl

command line too cwl:

cwlVersion: v1.2
class: CommandLineTool
inputs:
  dependent_parameters:
    type:
      type: record
      name: dependent_parameters
      fields:
        itemA:
          type: string
          inputBinding:
            prefix: -A
        itemB:
          type: string
          inputBinding:
            prefix: -B
outputs:
  example_out:
    type: stdout
stdout: output.txt
baseCommand: echo

If the workflow input type is the same record schema as the tool, you can simply parse them like you do with any object such as a file or directory. record_wf.cwl

cwlVersion: v1.2
class: Workflow

inputs:
  dependent_parameters:
    type:
      type: record
      name: dependent_parameters
      fields:
        itemA:
          type: string
        itemB:
          type: string

steps:
  record:
    run: record.cwl
    in:
      dependent_parameters: dependent_parameters
    out: [example_out]

outputs:
  example_out:
    type: File
    outputSource: record/example_out

Things get a little tricky when you need to convert a record type from the directory to the workflow i.e let's say your workflow has itemC instead of itemB, but your tool has the existing itemA / itemB record type.

One way to convert is by using the valueFrom attribute in the step input (note the requirements update needed to the workflow) record_wf.cwl

#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: Workflow

# Requirements needed for workflow
# valueFrom requires StepInputExpressionRequirement
# While any javascript code present will require the InlineJavascriptRequirement
requirements:
  StepInputExpressionRequirement: {}
  InlineJavascriptRequirement: {}

inputs:
  dependent_parameters:
    type:
      type: record
      name: dependent_parameters
      fields:
        itemA:
          type: string
        itemC:  # New record type
          type: string

steps:
  record:
    run: record.cwl  # Expects old record type
    in:
      dependent_parameters:
        source: dependent_parameters
        valueFrom: |
          ${
            return {
              "itemA": self.itemA,
              "itemB": self.itemC
            }
          }
    out: [example_out]

outputs:
  example_out:
    type: File
    outputSource: record/example_out

Let's test this with our input yaml

dependent_parameters: 
    itemA: a_string  # type "string
    itemC: c_string  # type "string"
cwltool record-wf.cwl input.yaml

Gives

INFO /usr/bin/cwltool 3.1.20220224085855
INFO Resolved 'record-wf.cwl' to 'file:///tmp/tmp.gkj56GiQ0w/record-wf.cwl'
INFO [workflow ] start
INFO [workflow ] starting step record
INFO [step record] start
INFO [job record] /tmp/oq_5u9f8$ echo \
    -A \
    a_string \
    -B \
    c_string > /tmp/oq_5u9f8/output.txt
INFO [job record] completed success
INFO [step record] completed success
INFO [workflow ] completed success
{
    "example_out": {
        "location": "file:///tmp/tmp.gkj56GiQ0w/output.txt",
        "basename": "output.txt",
        "class": "File",
        "checksum": "sha1$f215e8565d4711ed1fac71938dee09ef7781870f",
        "size": 24,
        "path": "/tmp/tmp.gkj56GiQ0w/output.txt"
    }
}
INFO Final process status is success

output.txt

-A a_string -B c_string

Let's instead assume an eggnog-mapper tool requires the inputs input_fasta, data_dir, db and diamond_db, we can instead use the JavaScript expression / valueFrom syntax shown above to collect each of these attributes from the eggnog input record.

For example

in:
  input_fasta: bakta/sequences_cds
  data_dir:
    source: eggnog
    valueFrom: |
      ${
         return self.data_dir;
       }
  db:
    source: eggnog
    valueFrom: |
      ${
         return self.db;
       }
  diamond_db:
    source: eggnog
    valueFrom: |
      ${
         return self.diamond_db;
       }

This last example could be more generalized...

mr-c commented 2 months ago

FYI: valueFrom: $(self.data_dir) is a shorter version of

    valueFrom: |
      ${
         return self.data_dir;
       }