galaxyproject / tools-iuc

Tool Shed repositories maintained by the Intergalactic Utilities Commission
https://galaxyproject.org/iuc
MIT License
163 stars 437 forks source link

BBDuk: fails on auto-uncompressed fastq with spaces in filenames. #6495

Open hexylena opened 3 weeks ago

hexylena commented 3 weeks ago

I provided a paired collection to bbduk which failed with the following error:

            java -ea -Xmx30303m -Xms30303m -cp /srv/galaxy/var/dependencies/_conda/envs/mulled-v1-ace992d4e029847e92000d257e16b237d31f99821592d4d6dbf742d389021c0f/opt/bbmap-39.01-1/current/ jgi.BBDuk in=SRX6855211_SRR10127028_1_fastq uncompressedSRX6855211_SRR10127028_1_fastq uncompressed.fastq in2=SRX6855211_SRR10127028_2_fastq uncompressedSRX6855211_SRR10127028_1_fastq uncompressed.fastq out=/data/galaxy/jobs/020/20180/outputs/dataset_32e47ed3-52bc-4a7d-8cea-e83aa26af423.dat out2=/data/galaxy/jobs/020/20180/outputs/dataset_056b24b6-3e37-4a97-bf79-eb4bf2b92b08.dat outm=/data/galaxy/jobs/020/20180/outputs/dataset_db56ccc6-9174-48e4-9d79-7ab5c7dc493c.dat outm2=/data/galaxy/jobs/020/20180/outputs/dataset_afeb15c0-401b-40f7-a4cc-78fb62885729.dat outs=/data/galaxy/jobs/020/20180/outputs/dataset_0b32aa02-3724-4537-b5b1-ac126faf5f53.dat k=27 rcomp=t maskmiddle=t minkmerhits=1 minkmerfraction=0.0 mincovfraction=0.0 hammingdistance=0 qhdist=0 editdistance=0 forbidn=f trimfailures=f findbestmatch=f skipr1=f skipr2=f t=4
Executing jgi.BBDuk [in=SRX6855211_SRR10127028_1_fastq, uncompressedSRX6855211_SRR10127028_1_fastq, uncompressed.fastq, in2=SRX6855211_SRR10127028_2_fastq, uncompressedSRX6855211_SRR10127028_1_fastq, uncompressed.fastq, out=/data/galaxy/jobs/020/20180/outputs/dataset_32e47ed3-52bc-4a7d-8cea-e83aa26af423.dat, out2=/data/galaxy/jobs/020/20180/outputs/dataset_056b24b6-3e37-4a97-bf79-eb4bf2b92b08.dat, outm=/data/galaxy/jobs/020/20180/outputs/dataset_db56ccc6-9174-48e4-9d79-7ab5c7dc493c.dat, outm2=/data/galaxy/jobs/020/20180/outputs/dataset_afeb15c0-401b-40f7-a4cc-78fb62885729.dat, outs=/data/galaxy/jobs/020/20180/outputs/dataset_0b32aa02-3724-4537-b5b1-ac126faf5f53.dat, k=27, rcomp=t, maskmiddle=t, minkmerhits=1, minkmerfraction=0.0, mincovfraction=0.0, hammingdistance=0, qhdist=0, editdistance=0, forbidn=f, trimfailures=f, findbestmatch=f, skipr1=f, skipr2=f, t=4]
Version 39.01

Exception in thread "main" java.lang.RuntimeException: Unknown parameter uncompressedSRX6855211_SRR10127028_1_fastq
    at jgi.BBDuk.<init>(BBDuk.java:538)
    at jgi.BBDuk.main(BBDuk.java:78)

The dataset names have spaces in them:

ln -s '/data/galaxy/f/e/1/dataset_fe1245a6-287d-4732-af60-37021f7eaab1.dat' 'SRX6855211_SRR10127028_1_fastq uncompressedSRX6855211_SRR10127028_1_fastq uncompressed.fastq' && ln -s '/data/galaxy/5/6/e/dataset_56eedb18-918a-4c26-a5db-f3504dd763c2.dat' 'SRX6855211_SRR10127028_2_fastq uncompressedSRX6855211_SRR10127028_1_fastq uncompressed.fastq' &&   bbduk.sh in='SRX6855211_SRR10127028_1_fastq uncompressedSRX6855211_SRR10127028_1_fastq uncompressed.fastq'  in2='SRX6855211_SRR10127028_2_fastq uncompressedSRX6855211_SRR10127028_1_fastq uncompressed.fastq' out='/data/galaxy/jobs/020/20180/outputs/dataset_32e47ed3-52bc-4a7d-8cea-e83aa26af423.dat' out2='/data/galaxy/jobs/020/20180/outputs/dataset_056b24b6-3e37-4a97-bf79-eb4bf2b92b08.dat' outm='/data/galaxy/jobs/020/20180/outputs/dataset_db56ccc6-9174-48e4-9d79-7ab5c7dc493c.dat' outm2='/data/galaxy/jobs/020/20180/outputs/dataset_afeb15c0-401b-40f7-a4cc-78fb62885729.dat' outs='/data/galaxy/jobs/020/20180/outputs/dataset_0b32aa02-3724-4537-b5b1-ac126faf5f53.dat'   k=27 rcomp='t' maskmiddle='t' minkmerhits='1' minkmerfraction=0.0 mincovfraction=0.0 hammingdistance=0 qhdist=0 editdistance=0 forbidn='f' trimfailures='f' findbestmatch='f' skipr1='f' skipr2='f'  t=${GALAXY_SLOTS:-4}

which I strongly suspect is at play here, given the line from the log file:

 [in=SRX6855211_SRR10127028_1_fastq, uncompressedSRX6855211_SRR10127028_1_fastq, uncompressed.fastq, in2=SRX6855211_SRR10127028_2_fastq, uncompressedSRX6855211_SRR10127028_1_fastq, uncompressed.fastq,

which somewhat suggests they're being passed as multiple arguments incorrectly.

bernt-matthias commented 2 weeks ago

Seems that you are not using the latest version. bbduk uses hardcoded symlink names since a while. https://github.com/galaxyproject/tools-iuc/pull/4329/files

Am I wrong?

Wondering why (auto)uncompressed files are used? It seems the the tool should accept zipped files.