common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
332 stars 230 forks source link

spaces in output file names #817

Closed pkp124 closed 2 years ago

pkp124 commented 6 years ago

Spaces present in output file names results in error.

cwl file: touch.cwl

cwlVersion: v1.0

class: CommandLineTool
requirements:
  - class: InlineJavascriptRequirement
  - class: ShellCommandRequirement
inputs:
  outputName:
    type: string
    doc: |
      Output name for the tool output
    inputBinding:
      position: 1

outputs:
  output:
    type: File
    outputBinding:
      glob: $(inputs.outputName)

baseCommand:
  - touch

json input: touch.json

{
  outputName: "a b"
}

command line:

cwltool --debug --relax-path-checks touch.cwl touch.json

Debug log:

[job touch.cwl] {
    "outputName": "a b"
}
[job touch.cwl] path mappings is {}
[job touch.cwl] command line bindings is [
    {
        "position": [
            -1000000, 
            0
        ], 
        "datum": "touch"
    }, 
    {
        "position": [
            1, 
            "outputName"
        ], 
        "datum": "a b"
    }
]
[job touch.cwl] /tmp/tmpc3_Od4$ /bin/sh \
    -c \
    'touch' 'a b'

[job touch.cwl] Job error:
Error collecting output for parameter 'output':
touch.cwl:16:3: Traceback (most recent call last):
touch.cwl:16:3: 
touch.cwl:16:3:   File "/work/cwltool/cwltool/command_line_tool.py", line 584, in collect_output_ports
touch.cwl:16:3:     compute_checksum=compute_checksum)
touch.cwl:16:3: 
touch.cwl:16:3:   File "/work/cwltool/cwltool/command_line_tool.py", line 677, in collect_output
touch.cwl:16:3:     with fs_access.open(rfile["location"], "rb") as f:
touch.cwl:16:3: 
touch.cwl:16:3:   File "/work/cwltool/cwltool/stdfsaccess.py", line 52, in open
touch.cwl:16:3:     return open(self._abs(fn), mode)
touch.cwl:16:3: 
touch.cwl:16:3: IOError: [Errno 2] No such file or directory: '/tmp/tmpc3_Od4/a%20b'
[job touch.cwl] completed permanentFail
[job touch.cwl] {}
[job touch.cwl] Removing input staging directory /tmp/tmpo7qMaK
[job touch.cwl] Removing temporary directory /tmp/tmpGqhzmH
{}
Final process status is permanentFail

cwltool version: 1.0.20180622214234

The output file is created. When collecting the output file, the space character is replaced with "%20", which i assume is because of the call to file_uri() in StdFsAccess class, glob() method. I am not sure if this is expected behaviour or a bug.

cmball1 commented 6 years ago

This snippet is where the file path is converted to URL lines 28-36:

class StdFsAccess(object):
    def __init__(self, basedir):  # type: (Text) -> None
        self.basedir = basedir

    def _abs(self, p):  # type: (Text) -> Text
        return abspath(p, self.basedir)

    def glob(self, pattern):  # type: (Text) -> List[Text]
        return [file_uri(str(self._abs(l))) for l in glob.glob(self._abs(pattern))]

file_uri is imported from schema_salad.ref_resolver

In ref_resolver.py, file_uri calls urllib.request.pathname2url.

The call to pathname2url is causing @ to be converted to %40 and spaces converted to %20.

The issue is the conversion happens after the CWL workflow is complete and moving files to the output path.

In the example above, /tmp/tmpc3_Od4/a b exists but CWLTool is looking for /tmp/tmpc3_Od4/a%20b.

I can modify line 36 in stdfsaccess.py from return [file_uri(str(self._abs(l))) for l in glob.glob(self._abs(pattern))] to return [urllib.request.url2pathname(file_uri(str(self._abs(l)))) for l in glob.glob(self._abs(pattern))] and this workflow will complete successfully.

I'm not sure that's the best fix. It seems like a fix in schema_salad ref_resolver.py would be more appropriate.

Is there a specific reason pathname2url is used in StdFsAccess .glob in stdfsaccess.py? @tetron @mr-c

sersorrel commented 6 years ago

This is not limited to spaces, characters like * also cause the problem.

mr-c commented 2 years ago

As of cwltool version 3.1.20211107152837 this works (likely earlier versions as well)