DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
901 stars 240 forks source link

Toil complains that a filename with spaces "contains illegal characters" while cwltool works fine #5158

Open ndonyapour opened 1 week ago

ndonyapour commented 1 week ago

Hello, we have workflows running with Toil that run issues when datasets contain files and folders with whitespaces in their names. We’d like to preserve the original data without running a preprocessing step to rename these files and folders. Is it possible to add support for file and folder names with whitespaces?

Thank you!

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-1669

adamnovak commented 1 week ago

As far as I know, Toil should already support whitespace in path names for input and output files. There's nothing I know of about its import/export machinery that shouldn't be able to handle it.

Are you using Toil with the CWL or WDL front-end? Are you using a particular CWL or WDL workflow? It's easy to write a WDL workflow that doesn't quote filename placeholders and won't work properly when filenames contain spaces.

Can you provide a reproducible example of Toil not supporting whitespace in a filename?

ndonyapour commented 1 week ago

Actually, the error is coming from CWL, not Toil. I’m using Toil with CWL, and here’s the error I'm running into

visit_class(d, cls, op)
     File "/home/donyapourn2/mambaforge-pypy3/envs/wic/lib/python3.10/site-packages/cwltool/utils.py", line 218, in visit_class
      visit_class(d, cls, op)
     File "/home/donyapourn2/mambaforge-pypy3/envs/wic/lib/python3.10/site-packages/cwltool/utils.py", line 213, in visit_class
      op(rec)
     File "/home/donyapourn2/mambaforge-pypy3/envs/wic/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 379, in check_adjust
      raise WorkflowException(
    cwl_utils.errors.WorkflowException: Invalid filename: 'CD_SOD1_2_E1023884 __1' contains illegal characters

I don’t get this error when I run the workflow with CWLtool directly. toil_whitespace_example.zip

mr-c commented 1 week ago

Actually, the error is coming from CWL, not Toil. I’m using Toil with CWL, and here’s the error I'm running into

visit_class(d, cls, op)
     File "/home/donyapourn2/mambaforge-pypy3/envs/wic/lib/python3.10/site-packages/cwltool/utils.py", line 218, in visit_class
      visit_class(d, cls, op)
     File "/home/donyapourn2/mambaforge-pypy3/envs/wic/lib/python3.10/site-packages/cwltool/utils.py", line 213, in visit_class
      op(rec)
     File "/home/donyapourn2/mambaforge-pypy3/envs/wic/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 379, in check_adjust
      raise WorkflowException(
    cwl_utils.errors.WorkflowException: Invalid filename: 'CD_SOD1_2_E1023884 __1' contains illegal characters

I don’t get this error when I run the workflow with CWLtool directly. toil_whitespace_example.zip

Can you try again with --relax-path-checks?

@adamnovak I remember that we had made this the default for toil-cwl-runner

stxue1 commented 1 week ago

Actually, the error is coming from CWL, not Toil. I’m using Toil with CWL, and here’s the error I'm running into

visit_class(d, cls, op)
     File "/home/donyapourn2/mambaforge-pypy3/envs/wic/lib/python3.10/site-packages/cwltool/utils.py", line 218, in visit_class
      visit_class(d, cls, op)
     File "/home/donyapourn2/mambaforge-pypy3/envs/wic/lib/python3.10/site-packages/cwltool/utils.py", line 213, in visit_class
      op(rec)
     File "/home/donyapourn2/mambaforge-pypy3/envs/wic/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 379, in check_adjust
      raise WorkflowException(
    cwl_utils.errors.WorkflowException: Invalid filename: 'CD_SOD1_2_E1023884 __1' contains illegal characters

I don’t get this error when I run the workflow with CWLtool directly. toil_whitespace_example.zip

Can you try again with --relax-path-checks?

@adamnovak I remember that we had made this the default for toil-cwl-runner

This is currently set to False by default: https://github.com/DataBiosphere/toil/blob/47ac7bfbc4c5b6a3db812a59542e6c91d5dd58ed/src/toil/options/cwl.py#L235

Maybe this should be set to True by default. We would then also need to either change the argument name or not have it as store_true.

ndonyapour commented 1 week ago

Thank you for your help! Using --relax-path-checks fixed the issue.