common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
335 stars 230 forks source link

cwltool mishandles line continuations in bash scripts #1271

Open ghost opened 4 years ago

ghost commented 4 years ago
cwltool --version
/Users/kghose/.venvs/toil/bin/cwltool 3.0.20200324120055

The following bash script is valid

set -xv

echo \
  "A new line" \
  "Another line" \
  "Boo"

The following CWL with an embedded bash script is not correctly handled by cwltool. It looks like the line continuation \ symbol is removed?

class: CommandLineTool
cwlVersion: v1.0
requirements:
  DockerRequirement:
    dockerPull: sevenbridges/svsim:1.0
  InitialWorkDirRequirement:
    listing:
      - entryname: script.sh
        entry: |          
          set -xv

          echo \
            "A new line" \
            "Another line" \
            "$(inputs.foo)"

baseCommand: ["bash", "script.sh"]

inputs:
  foo: string

outputs:
  out: stdout

stdout: out.txt

The cwltool (and also toil, since toil uses cwltool) error is:

echo
+ echo
  "A new line"
+ 'A new line'
script.sh: line 4: A new line: command not found
  "Another line"
+ 'Another line'
script.sh: line 5: Another line: command not found
  "Hello"
+ Hello
script.sh: line 6: Hello: command not found
INFO [job test.cwl] Max memory used: 0MiB
WARNING [job test.cwl] completed permanentFail

As a contrast the SB platform executor executes this correctly.

ghost commented 4 years ago

With some experimentation it seems that we need to escape the line continuation \\ but this should not be.

mr-c commented 4 years ago

Checking the yaml spec..

5.7. Escaped Characters Note that escape sequences are only interpreted in double-quoted scalars. In all other scalar styles, the “\” character has no special meaning and non-printable characters are not available.

In the example, a | indicating the literal scalar style

8.1.2. Literal Style The literal style is denoted by the “|” indicator. It is the simplest, most restricted, and most readable scalar style. Inside literal scalars, all (indented) characters are considered to be content, including white space characters. Note that all line break characters are normalized. In addition, empty lines are not folded, though final line breaks and trailing empty lines are chomped.

There is no way to escape characters inside literal scalars. This restricts them to printable characters. In addition, there is no way to break a long literal line.

So either ruaml.yaml or cwltool is making an error here

mr-c commented 4 years ago

@tetron has requested new conformance tests in the v1.2 repo

ghost commented 4 years ago

@tetron I added this, by mistake, to an existing PR. The original PR was tiny, just a wording clarification in the docs, so I think this is ok. I'll change the PR title.