bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
389 stars 189 forks source link

--interleaved not recognized #115

Open DiegoBrambilla opened 5 years ago

DiegoBrambilla commented 5 years ago

Dear metaWRAP developers, Greetings, this is Diego Brambilla, Research Engeneer at Linnaeus University. Hope you could grant me a little of your time to address the following issue. I tried to run the binning module with interleaved files but it is not possible because the software does not recognize --interleaved as flag.

metawrap binning --interleaved --metabat2 --maxbin -t 6 -a final.contigs.fasta -o . 12576.4.266261.CCAATAGG-CCTATTGG.filter-METAGENOME.fastq.gz 12576.4.266261.CCGACTAT-ATAGTCGG.filter-METAGENOME.fastq.gz 12576.4.266261.TATCAGCG-CGCTGATA.filter-METAGENOME.fastq.gz 12576.4.266261.TATTCCGG-CCGGAATA.filter-METAGENOME.fastq.gz 12576.4.266261.TGTTCGAG-CTCGAACA.filter-METAGENOME.fastq.gz 12605.3.269723.AAGGACAC-GTGTCCTT.filter-METAGENOME.fastq.gz 12605.3.269723.AAGTCCGT-ACGGACTT.filter-METAGENOME.fastq.gz 12605.3.269723.TGAGCTAG-CTAGCTCA.filter-METAGENOME.fastq.gz OX_2_MG.filtered.fastq.gz
getopt: **unrecognized option '--interleaved'**

Usage: metaWRAP binning [options] -a assembly.fa -o output_dir readsA_1.fastq readsA_2.fastq ... [readsX_1.fastq readsX_2.fastq]
Note1: Make sure to provide all your separately replicate read files, not the joined file.
Note2: You may provide single end or interleaved reads as well with the use of the correct option
Note3: If the output already has the .bam alignments files from previous runs, the module will skip re-aligning the reads

Options:

    -a STR          metagenomic assembly file
    -o STR          output directory
    -t INT          number of threads (default=1)
    -m INT      amount of RAM available (default=4)
    -l INT      minimum contig length to bin (default=1000bp). Note: metaBAT will default to 1500bp minimum

    --metabat2      bin contigs with metaBAT2
    --metabat1  bin contigs with the original metaBAT
    --maxbin2   bin contigs with MaxBin2
    --concoct   bin contigs with CONCOCT (warning: this one is slow...)

    --universal use universal marker genes instead of bacterial markers in MaxBin2 (improves Archaea binning)
    --run-checkm    immediately run CheckM on the bin results (requires 40GB+ of memory)
    --single-end    non-paired reads mode (provide *.fastq files)
    **--interleaved**   the input read files contain interleaved paired-end reads

Can you help me here, please?

ursky commented 5 years ago

Thank you for this. This is a simple mistake on my end. I fixed and added it to this github page. The easiest thing for you would be to find your miniconda2/bin/metawrap-modules/binning.sh script and replace it with the new one from github: https://github.com/bxlab/metaWRAP/blob/master/bin/metawrap-modules/binning.sh. Otherwise, the bug fix will be included in v1.1.3 when it comes out.

DiegoBrambilla commented 5 years ago

Hi, I have substituted binning.sh as you said and run the same command as above but I received this error message (I made sure I had loaded the metawrap environment before launching the command):

metawrap binning --interleaved --metabat2 --maxbin -t 6 -a final.contigs.fasta -o . 12576.4.266261.CCAATAGG-CCTATTGG.filter-METAGENOME.fastq.gz 12576.4.266261.CCGACTAT-ATAGTCGG.filter-METAGENOME.fastq.gz 12576.4.266261.TATCAGCG-CGCTGATA.filter-METAGENOME.fastq.gz 12576.4.266261.TATTCCGG-CCGGAATA.filter-METAGENOME.fastq.gz 12576.4.266261.TGTTCGAG-CTCGAACA.filter-METAGENOME.fastq.gz 12605.3.269723.AAGGACAC-GTGTCCTT.filter-METAGENOME.fastq.gz 12605.3.269723.AAGTCCGT-ACGGACTT.filter-METAGENOME.fastq.gz 12605.3.269723.TGAGCTAG-CTAGCTCA.filter-METAGENOME.fastq.gz OX_2_MG.filtered.fastq.gz
getopt: option '--longoptions' requires an argument
Try `getopt --help' for more information.
/home/diegob/miniconda3/envs/metawrap-env/bin/metawrap-modules/binning.sh: line 109: help,metabat1,metabat2,maxbin2,concoct,run-checkm,single-end,universal,interleaved: command not found

As you noticed I am using miniconda3, but I don't think this is relevant.

ursky commented 5 years ago

Ok, now that's strange. I cant seem to replicate that on my end. Did you by any chance have any luck trying the other modules with the long options? Looks like you might be somehow running a different version of getopt, but I am not sure.

DiegoBrambilla commented 5 years ago

The answer is simple: I have copied the content of https://github.com/bxlab/metaWRAP/blob/master/bin/metawrap-modules/binning.sh and pasted on a new binning.sh file, but in the process some new lines were introduced thus making part of the script senseless. Do you have any suggestion on how to retrieve binning.sh from github? wget only gets me an html file.

ursky commented 5 years ago

I would just clone the repository. Or get the raw file: wget https://raw.githubusercontent.com/bxlab/metaWRAP/master/bin/metawrap-modules/binning.sh; chmod +x binning.sh

DiegoBrambilla commented 5 years ago

Thank you. I managed to substitute correctly binning.sh and a new error message arise, telling me that I have not specified any parameter under -a or -o flag. That is not the case, I have even tried adding the absolute path of the output folder but I got the same result. Still no clue about it.

metawrap binning --interleaved --metabat2 --maxbin -t 6 -a final.contigs.fasta -o /home/diegob/Binning/OX_1_MG/ 12576.4.266261.CCAATAGG-CCTATTGG.filter-METAGENOME.fastq.gz 12576.4.266261.CCGACTAT-ATAGTCGG.filter-METAGENOME.fastq.gz 12576.4.266261.TATCAGCG-CGCTGATA.filter-METAGENOME.fastq.gz 12576.4.266261.TATTCCGG-CCGGAATA.filter-METAGENOME.fastq.gz 12576.4.266261.TGTTCGAG-CTCGAACA.filter-METAGENOME.fastq.gz 12605.3.269723.AAGGACAC-GTGTCCTT.filter-METAGENOME.fastq.gz 12605.3.269723.AAGTCCGT-ACGGACTT.filter-METAGENOME.fastq.gz 12605.3.269723.TGAGCTAG-CTAGCTCA.filter-METAGENOME.fastq.gz OX_2_MG.filtered.fastq.gz

------------------------------------------------------------------------------------------------------------------------
-----                             Non-optional parameters -a and/or -o were not entered                            -----
------------------------------------------------------------------------------------------------------------------------
ursky commented 5 years ago

That is misdiagnosed. You used --maxbin instead of --maxbin2. Super annoying that the getopt parser doesn't catch that. I added a check for that for the next version.

DiegoBrambilla commented 5 years ago

Really appreciate your support so far. All in all, I think it was for the best that one more bug in the script was found. This time I have used the same command as before but only changing --maxbin into --maxbin2 as you suggested. The interleaved file format was recognized but not the fastq file format for interleaved or single read files. Is it within expectations?

------------------------------------------------------------------------------------------------------------------------
-----                                        Entered read type: interleaved                                        -----
------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------
-----               Unable to find read files in format *.fastq (for single-end or interleaved reads)              -----
------------------------------------------------------------------------------------------------------------------------
ursky commented 5 years ago

It's like the error says - the files need to end with '.fastq'. Unzip your files and you should be good.

DiegoBrambilla commented 5 years ago

Right, I also think it will have no problem running with decompressed fastq files. This solves the issue, and I am grateful for your continuous help during these past days. On that regard, I have tried to write the unzipped files to stdout with <(gunzip -c .fastq.gz): `metawrap binning --interleaved --metabat2 --maxbin -t 6 -a final.contigs.fasta -o . <(gunzip -c .fastq.gz)` I get the same error as before because the whole line is replaced with the pipe /dev/fd/63, effectively trying to call it as a command, which is not recognized as fastq files by metaWRAP. I think metawrap probably tries to check the filename, to make sure it's likely to be a fastq file and then, of course, it's not. This makes sense since metawrap in turn will call other programs with the file names as arguments. Thus, I could open another issue for updating the metawrap code to accept .fastq.gz file names and check which programs take gz files.

ursky commented 5 years ago

Unfortunately, you will have to unzip them at this time. Piping in like you tried does not work. If I ever include support zipped files I would prefer to do it for all modules, which would not be easy to do for some of them. If someone wants to take a stab at implementing this and makes a pull request, I would appreciate it.

DiegoBrambilla commented 5 years ago

Thanks for the kind advices, unzipping files will do. Keep up the good work!