bahlolab / superSTR

A lightweight, alignment-free utility for detecting repeat-containing reads in short-read WGS, WES and RNA-seq data.
GNU General Public License v2.0
17 stars 7 forks source link

ERROR: Output file output_dir/ does not appear to exist. #5

Open wdecoster opened 3 years ago

wdecoster commented 3 years ago

Hi,

Hi, I am trying to run a bunch of fastq files in parallel, but I get: ERROR: Output file outputdir/superstr/ does not appear to exist.

My command is:

cat fastq_paths.fofn | parallel -j4 --bar 'superstr --mode=fastq -o output_dir/superstr_{/}/ -t 0.56 {}_R1.fastq.gz {}_R2.fastq.gz'

Which results in e.g.

superstr --mode=fastq -o output_dir/superstr_<my_identifier> -t 0.56 /path/<my_identifier>_R1.fastq.gz /path/<my_identifier>_R2.fastq.gz

(of course with path and being reasonable names)

I also tried variations, without the trailing slash to use it as a prefix rather than a directory, but none seem to work.

I tried creating output_dir first - but that doesn't help. I now hacked around it by doing:

cat fastq_paths.fofn | parallel -j4 --bar 'mkdir output_dir/superstr_{/}  && superstr --mode=fastq -o output_dir/superstr_{/}/ -t 0.56 {}_R1.fastq.gz {}_R2.fastq.gz'

That seems to run. But I would call that rather counterintuitive and unpractical.

So it does seem to me that -o expects an existing directory, and not like the documentation seems to suggest Note: The -o flag is an output prefix rather than a file path; "per_read.txt.gz" is appended to the prefix. Could superstr create directories if these don't exist?

Cheers, Wouter

lfearnley commented 3 years ago

Thanks for picking up on this, and I appreciate the detailed report! I'll update the documentation on this shortly - there is a fix in the works too.