DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
894 stars 241 forks source link

spaces in output names #2280

Closed pkp124 closed 6 years ago

pkp124 commented 6 years ago

Hello,

When spaces are given in the output file/directory names of cwl descriptions, cwltoil fails with an error. Below are the details: cwl

cwlVersion: v1.0

class: CommandLineTool
requirements:
  - class: InlineJavascriptRequirement
  - class: ShellCommandRequirement
inputs:
  outputName:
    type: string
    doc: |
      Output name for the tool output
    inputBinding:
      position: 1

outputs:
  output:
    type: File
    outputBinding:
      glob: $(inputs.outputName)

baseCommand:
  - touch

Json input

{
  outputName: "a b"
}

cwltoil debug:

2018-07-06 15:03:26,680 - toil.leader - WARNING - 5/K/jobFFqiSS    DEBUG:toil.worker:Parsed job wrapper
WARNING:toil.leader:5/K/jobFFqiSS    DEBUG:toil.worker:Got a command to run: _toil 5/K/jobFFqiSS/g/tmpyjhyub.tmp /work/toil/src toil.cwl.cwltoil False
2018-07-06 15:03:26,681 - toil.leader - WARNING - 5/K/jobFFqiSS    DEBUG:toil.worker:Got a command to run: _toil 5/K/jobFFqiSS/g/tmpyjhyub.tmp /work/toil/src toil.cwl.cwltoil False
WARNING:toil.leader:5/K/jobFFqiSS    DEBUG:toil.job:Loading user module ModuleDescriptor(dirPath='/work/toil/src', name='toil.cwl.cwltoil', fromVirtualEnv=False).
2018-07-06 15:03:26,681 - toil.leader - WARNING - 5/K/jobFFqiSS    DEBUG:toil.job:Loading user module ModuleDescriptor(dirPath='/work/toil/src', name='toil.cwl.cwltoil', fromVirtualEnv=False).
WARNING:toil.leader:5/K/jobFFqiSS    DEBUG:rdflib:RDFLib Version: 4.2.2
2018-07-06 15:03:26,681 - toil.leader - WARNING - 5/K/jobFFqiSS    DEBUG:rdflib:RDFLib Version: 4.2.2
WARNING:toil.leader:5/K/jobFFqiSS    [job touch.cwl] /work/toil/cwl/out_tmpdirNAHmaJ$ /bin/sh \
2018-07-06 15:03:26,681 - toil.leader - WARNING - 5/K/jobFFqiSS    [job touch.cwl] /work/toil/cwl/out_tmpdirNAHmaJ$ /bin/sh \
WARNING:toil.leader:5/K/jobFFqiSS        -c \
2018-07-06 15:03:26,681 - toil.leader - WARNING - 5/K/jobFFqiSS        -c \
WARNING:toil.leader:5/K/jobFFqiSS        'touch' 'a b'
2018-07-06 15:03:26,681 - toil.leader - WARNING - 5/K/jobFFqiSS        'touch' 'a b'
WARNING:toil.leader:5/K/jobFFqiSS    INFO:cwltool:[job touch.cwl] /work/toil/cwl/out_tmpdirNAHmaJ$ /bin/sh \
2018-07-06 15:03:26,682 - toil.leader - WARNING - 5/K/jobFFqiSS    INFO:cwltool:[job touch.cwl] /work/toil/cwl/out_tmpdirNAHmaJ$ /bin/sh \
WARNING:toil.leader:5/K/jobFFqiSS        -c \
2018-07-06 15:03:26,682 - toil.leader - WARNING - 5/K/jobFFqiSS        -c \
WARNING:toil.leader:5/K/jobFFqiSS        'touch' 'a b'
2018-07-06 15:03:26,682 - toil.leader - WARNING - 5/K/jobFFqiSS        'touch' 'a b'
WARNING:toil.leader:5/K/jobFFqiSS    [job touch.cwl] Job error:
2018-07-06 15:03:26,682 - toil.leader - WARNING - 5/K/jobFFqiSS    [job touch.cwl] Job error:
WARNING:toil.leader:5/K/jobFFqiSS    Error collecting output for parameter 'output':
2018-07-06 15:03:26,682 - toil.leader - WARNING - 5/K/jobFFqiSS    Error collecting output for parameter 'output':
WARNING:toil.leader:5/K/jobFFqiSS    touch.cwl:16:3: [Errno 2] No such file or directory: '/work/toil/cwl/out_tmpdirNAHmaJ/a%20b'
2018-07-06 15:03:26,682 - toil.leader - WARNING - 5/K/jobFFqiSS    touch.cwl:16:3: [Errno 2] No such file or directory: '/work/toil/cwl/out_tmpdirNAHmaJ/a%20b'
WARNING:toil.leader:5/K/jobFFqiSS    ERROR:cwltool:[job touch.cwl] Job error:
2018-07-06 15:03:26,683 - toil.leader - WARNING - 5/K/jobFFqiSS    ERROR:cwltool:[job touch.cwl] Job error:
WARNING:toil.leader:5/K/jobFFqiSS    Error collecting output for parameter 'output':
2018-07-06 15:03:26,683 - toil.leader - WARNING - 5/K/jobFFqiSS    Error collecting output for parameter 'output':
WARNING:toil.leader:5/K/jobFFqiSS    touch.cwl:16:3: [Errno 2] No such file or directory: '/work/toil/cwl/out_tmpdirNAHmaJ/a%20b'
2018-07-06 15:03:26,683 - toil.leader - WARNING - 5/K/jobFFqiSS    touch.cwl:16:3: [Errno 2] No such file or directory: '/work/toil/cwl/out_tmpdirNAHmaJ/a%20b'
WARNING:toil.leader:5/K/jobFFqiSS    [job touch.cwl] completed permanentFail
2018-07-06 15:03:26,683 - toil.leader - WARNING - 5/K/jobFFqiSS    [job touch.cwl] completed permanentFail
WARNING:toil.leader:5/K/jobFFqiSS    WARNING:cwltool:[job touch.cwl] completed permanentFail
2018-07-06 15:03:26,683 - toil.leader - WARNING - 5/K/jobFFqiSS    WARNING:cwltool:[job touch.cwl] completed permanentFail
WARNING:toil.leader:5/K/jobFFqiSS    DEBUG:toil.fileStore:LOG-TO-MASTER: Job 5/K/jobFFqiSS/g/tmpyjhyub.tmp used 0.00% (4.0 KB [4096B] used, 3.0 GB [3221225472B] requested) at the end of its run.
2018-07-06 15:03:26,683 - toil.leader - WARNING - 5/K/jobFFqiSS    DEBUG:toil.fileStore:LOG-TO-MASTER: Job 5/K/jobFFqiSS/g/tmpyjhyub.tmp used 0.00% (4.0 KB [4096B] used, 3.0 GB [3221225472B] requested) at the end of its run.
WARNING:toil.leader:5/K/jobFFqiSS    Traceback (most recent call last):
2018-07-06 15:03:26,684 - toil.leader - WARNING - 5/K/jobFFqiSS    Traceback (most recent call last):
WARNING:toil.leader:5/K/jobFFqiSS      File "/work/toil/src/toil/worker.py", line 313, in workerScript
2018-07-06 15:03:26,684 - toil.leader - WARNING - 5/K/jobFFqiSS      File "/work/toil/src/toil/worker.py", line 313, in workerScript
WARNING:toil.leader:5/K/jobFFqiSS        job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
2018-07-06 15:03:26,684 - toil.leader - WARNING - 5/K/jobFFqiSS        job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
WARNING:toil.leader:5/K/jobFFqiSS      File "/work/toil/src/toil/job.py", line 1350, in _runner
2018-07-06 15:03:26,684 - toil.leader - WARNING - 5/K/jobFFqiSS      File "/work/toil/src/toil/job.py", line 1350, in _runner
WARNING:toil.leader:5/K/jobFFqiSS        returnValues = self._run(jobGraph, fileStore)
2018-07-06 15:03:26,684 - toil.leader - WARNING - 5/K/jobFFqiSS        returnValues = self._run(jobGraph, fileStore)
WARNING:toil.leader:5/K/jobFFqiSS      File "/work/toil/src/toil/job.py", line 1295, in _run
2018-07-06 15:03:26,684 - toil.leader - WARNING - 5/K/jobFFqiSS      File "/work/toil/src/toil/job.py", line 1295, in _run
WARNING:toil.leader:5/K/jobFFqiSS        return self.run(fileStore)
2018-07-06 15:03:26,684 - toil.leader - WARNING - 5/K/jobFFqiSS        return self.run(fileStore)
WARNING:toil.leader:5/K/jobFFqiSS      File "/work/toil/src/toil/cwl/cwltoil.py", line 487, in run
2018-07-06 15:03:26,685 - toil.leader - WARNING - 5/K/jobFFqiSS      File "/work/toil/src/toil/cwl/cwltoil.py", line 487, in run
WARNING:toil.leader:5/K/jobFFqiSS        raise cwltool.errors.WorkflowException(status)
2018-07-06 15:03:26,685 - toil.leader - WARNING - 5/K/jobFFqiSS        raise cwltool.errors.WorkflowException(status)
WARNING:toil.leader:5/K/jobFFqiSS    WorkflowException: permanentFail
2018-07-06 15:03:26,685 - toil.leader - WARNING - 5/K/jobFFqiSS    WorkflowException: permanentFail
WARNING:toil.leader:5/K/jobFFqiSS    ERROR:toil.worker:Exiting the worker because of a failed job on host dev01.int.bluebee.com
2018-07-06 15:03:26,685 - toil.leader - WARNING - 5/K/jobFFqiSS    ERROR:toil.worker:Exiting the worker because of a failed job on host dev01.int.bluebee.com
WARNING:toil.leader:5/K/jobFFqiSS    WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'file:///work/toil/cwl/touch.cwl' touch 5/K/jobFFqiSS with ID 5/K/jobFFqiSS to 0
2018-07-06 15:03:26,685 - toil.leader - WARNING - 5/K/jobFFqiSS    WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'file:///work/toil/cwl/touch.cwl' touch 5/K/jobFFqiSS with ID 5/K/jobFFqiSS to 0

The output is created but cwltool fails to collect it, probably because the location is parsed using urllib and the space character is replaced with %20(not sure about this). This could also cause issues in cwltoil.py, wherever location is accessed as it is. For example, https://github.com/DataBiosphere/toil/blob/0f6127f51060f4675d6a071f5d86782bc435c4e9/src/toil/cwl/cwltoil.py#L330

The command used to reproduce the issue was

cwltoil --batchSystem gridengine --outdir $PWD --jobStore $PWD/job --retryCount 0 --logDebug touch.cwl touch.json

┆Issue is synchronized with this JIRA Story ┆Issue Number: TOIL-298

DailyDreaming commented 6 years ago

Closed as this is primarily a cwltool issue: https://github.com/common-workflow-language/cwltool/issues/817