biocore / qiime

Official QIIME 1 software repository. QIIME 2 (https://qiime2.org) has succeeded QIIME 1 as of January 2018.
GNU General Public License v2.0
285 stars 268 forks source link

Scripts with system calls won't work if there's spaces in the input/output paths #2033

Open ElDeveloper opened 9 years ago

ElDeveloper commented 9 years ago

If you call a script that's making a system call and one of your input/output paths have spaces in them, the script will error as most command strings are formed like this:

path_with_spaces = '/Users/yoshiki/foo bar baz/seqs.fasta'
cmd = 'count_seqs.py -i %s -o some-path' % path_with_spaces

_ = qcli_system_call(cmd)
# more code

So even if the path /Users/yoshiki/foo bar baz/seqs.fasta was passed between quotes (or if you escape the spaces) from the terminal, the script will still error saying that /Users/yoshiki/foo doesn't exist. Solution is just to not have spaces in your path names.

gregcaporaso commented 9 years ago

@ElDeveloper, I can't reproduce this:

In [1]: !touch "some ugly filename.txt"

In [3]: from qcli.util import qcli_system_call

In [5]: qcli_system_call("ls -al 'some ugly filename.txt'")
Out[5]:
('-rw-r--r--@ 1 caporaso  staff  0 May 27 06:49 some ugly filename.txt\n',
 '',
 0)

Am I misunderstanding the issue?

adamrp commented 9 years ago

@gregcaporaso It works for you because you're explicitly quoting the filename in the system call. If you use string substitution (as @ElDeveloper does in his example) you should see the problem:

In [1]: !touch 'ugly filename.txt'

In [2]: from qcli import qcli_system_call

In [3]: cmd = 'ls %s'

In [4]: filename = 'ugly filename.txt'

In [5]: qcli_system_call(cmd % filename)
Out[5]: 
('',
 'ls: filename.txt: No such file or directory\nls: ugly: No such file or directory\n',
 1)

Although I'm not sure, specifically, where in QIIME this presents a problem.

gregcaporaso commented 9 years ago

That makes sense, but really cmd in that case should be 'ls "%s"' - there's no way that qcli could figure out what is supposed to be a file name versus e.g. positional arguments. (So in that case, you're saying the issue is with the QIIME scripts? I was initially thinking that @ElDeveloper was saying this was a qcli_system_call issue.) Note that Popen handles this by accepting a list of command tokens. I thought qcli_system_call supported that, but the result isn't correct:

In [1]: from qcli.util import qcli_system_call

In [2]: !touch "some ugly filename.txt"

In [3]: qcli_system_call(["ls", "-al" "some ugly filename.txt"])
Out[3]: ('Document.pdf\nsome ugly filename.txt\n', '', 0)
adamrp commented 9 years ago

Correct on all accounts! I think that it's an issue with some QIIME scripts, but @ElDeveloper can probably shed more light (i.e., when/where he ran into this issue). From a quick look, I think that workflow scripts will have this problem:

(python276)adam:beta_diversity_through_plots@master$ ls
Fasting_Map.txt otu_table.biom  rep_set.tre
(python276)adam:beta_diversity_through_plots@master$ pwd
/Users/adam/git/qiime/qiime_test_data/beta_diversity_through_plots
(python276)adam:beta_diversity_through_plots@master$ ls -1
Fasting_Map.txt
otu_table.biom
rep_set.tre
(python276)adam:beta_diversity_through_plots@master$ mv otu_table.biom 'otu table.biom'
(python276)adam:beta_diversity_through_plots@master$ beta_diversity_through_plots.py -i otu\ table.biom -m Fasting_Map.txt -t rep_set.tre -o test_out
Traceback (most recent call last):
  File "/Users/adam/git/qiime/scripts/beta_diversity_through_plots.py", line 153, in <module>
    main()
  File "/Users/adam/git/qiime/scripts/beta_diversity_through_plots.py", line 150, in main
    status_update_callback=status_update_callback)
  File "/Users/adam/git/qiime/qiime/workflow/downstream.py", line 183, in run_beta_diversity_through_plots
    close_logger_on_success=close_logger_on_success)
  File "/Users/adam/git/qiime/qiime/workflow/util.py", line 122, in call_commands_serially
    raise WorkflowError(msg)
qiime.workflow.util.WorkflowError: 

*** ERROR RAISED DURING STEP: Beta Diversity (weighted_unifrac)
Command run was:
 beta_diversity.py -i otu table.biom -o test_out --metrics weighted_unifrac  -t rep_set.tre 
Command returned exit status: 2
Stdout:

Stderr
Error in beta_diversity.py: option -i: path does not exist: 'otu'

If you need help with QIIME, see:
http://help.qiime.org
wasade commented 9 years ago

argv comes in clean and as expected here, so I think just wrapping everything in quotes would do it:

$ python -c "import sys; print sys.argv" beta_diversity_through_plots.py -i otu\ table.biom -m Fasting_Map.txt -t rep_set.tre -o test_out
['-c', 'beta_diversity_through_plots.py', '-i', 'otu table.biom', '-m', 'Fasting_Map.txt', '-t', 'rep_set.tre', '-o', 'test_out']
ElDeveloper commented 9 years ago

@adamrp, got it exactly right!! :100:

On (May-27-15| 8:16), Daniel McDonald wrote:

argv comes in clean and as expected here, so I think just wrapping everything in quotes would do it:

$ python -c "import sys; print sys.argv" beta_diversity_through_plots.py -i otu\ table.biom -m Fasting_Map.txt -t rep_set.tre -o test_out
['-c', 'beta_diversity_through_plots.py', '-i', 'otu table.biom', '-m', 'Fasting_Map.txt', '-t', 'rep_set.tre', '-o', 'test_out']

Reply to this email directly or view it on GitHub: https://github.com/biocore/qiime/issues/2033#issuecomment-105952502

kylebittinger commented 9 years ago

The issue of spaces and other special characters can be solved by setting shell=False and passing a list of command arguments rather than a string representing the entire command. Refactoring the example posted by @ElDeveloper:

path_with_spaces = '/Users/yoshiki/foo bar baz/seqs.fasta'
cmd = ['count_seqs.py', '-i', path_with_spaces, '-o', 'some-path']
qcli_system_call(cmd, shell=False)

Using this technique, you have access to all the list methods, rather than relying on string interpolation to build the command. Are there issues with shell=False that would prevent us from adopting this strategy?

The example from @gregcaporaso did not set shell=False. In this case, only the first item in the list is passed as a command to the shell. Subsequent items in the list are "treated as additional arguments to the shell itself." Thus, you saw the expected output from the command ls.