DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
894 stars 241 forks source link

toil-cwl-runner can ignore LoadListingRequirements on a Workflow and always uses `no_listing` #5104

Open stxue1 opened 2 days ago

stxue1 commented 2 days ago

May be related to #5099 Given this workflow:

#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: Workflow
requirements:
  InlineJavascriptRequirement: {}
  LoadListingRequirement:
    loadListing: shallow_listing
  StepInputExpressionRequirement: {}
inputs:
  input_directory:
    type: Directory
outputs:
  output_file:
    type: File
    outputSource: echo/out
steps:
  echo:
    run:
      class: CommandLineTool
      requirements:
        LoadListingRequirement:
          loadListing: deep_listing
      baseCommand: echo
      inputs:
        message:
          type: string
          inputBinding: {}
        dir:
          type: Directory
      outputs:
        out:
          type: stdout
    in:
      dir: input_directory
      message: 
        valueFrom: $(JSON.stringify(inputs.dir))
    out: [out]

With JSON input file:

{
    "input_directory": {"class": "Directory", "location": "directory"}
}

And a directory in the current working directory as:

(venv3.12) heaucques@pop-os:~/Documents/toil$ tree directory
directory
├── directory
│   └── file2.txt
└── file.txt

1 directory, 2 files

After running the command:

toil-cwl-runner shallow_listing_workflow.cwl shallow_listing.json > json.txt && jq . $(jq -r .output_file.path json.txt)

I'm getting this directory object with no listing (with what the expression at workflow scope viewed):

{
  "class": "Directory",
  "location": "toildir:eyJkaXJlY3RvcnkiOiB7ImZpbGUyLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtMWUyYjUzNjYxODQ2NDVjZjg3OTc3NTE5MDJkZTU3ZmEvZmlsZTIudHh0In0sICJmaWxlLnR4dCI6ICJ0b2lsZmlsZTowOjA6ZmlsZXMvbm8tam9iL2ZpbGUtNDhlYjIxNTc2MzhiNDhmNGE5YmU2NmJjODIzNWMzMTQvZmlsZS50eHQifQ==",
  "basename": "directory"
}

The expression is supposed to run with shallow_listing, but appears to be running with no_listing.

Running the cwltool gives it the right shallow_listing:

{
  "class": "Directory",
  "location": "file:///home/heaucques/Documents/toil/directory",
  "basename": "directory",
  "listing": [
    {
      "class": "Directory",
      "location": "file:///home/heaucques/Documents/toil/directory/directory",
      "basename": "directory"
    },
    {
      "class": "File",
      "location": "file:///home/heaucques/Documents/toil/directory/file.txt",
      "basename": "file.txt",
      "size": 0
    }
  ]
}

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1650

stxue1 commented 1 day ago

I think cwltoil grabs the correct load listing, but is overridden later: https://github.com/DataBiosphere/toil/blob/a698f45465abc59e8e533d749980a914c01c824c/src/toil/cwl/cwltoil.py#L4023-L4027