bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics
GNU Affero General Public License v3.0
19 stars 9 forks source link

MATAM aborting with non-zero return code: -8 #62

Closed rotoscan closed 6 years ago

rotoscan commented 6 years ago

Hello,

This is a great tool but I am having some issues to run when using my data. I have performed the matam_assembly.py already with the provided example and I got a complete run with no errors. When I try for my data, though, I have issues. I would really appreciate some attention and help to solve my issue. Below I provide more information.

Thank you deeply. Best Regards, Rodolfo

My command was this:

$ matam_assembly.py -i 12.fasta.READS._MATAM.fq -d /data/msb/tools/matam/DATABASE/SILVA_128_SSURef_NR95 --cpu 4 --max_memory 10000 -v -o matam_interleaved

this is the output on my matam.log file:

INFO - === MATAM assembly ===
INFO - CMD: /gpfs1/data/msb/tools/miniconda/miniconda2/envs/matam_env/opt/matam-v1.5.0/scripts/matam_assembly.py --verbose --cpu 4 --max_memory 10000 --best 10 --evalue 1.00e-05 --score_threshold
 0.90 --coverage_threshold 0 --min_identity 1.00 --min_overlap_length 50 --min_read_node 1 --min_overlap_edge 1 --quorum 0.51 --read_correction auto --contig_coverage_threshold 20 --min_scaffold_
length 500 --out_dir /gpfs1/data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins_reads_for_emirge/dastool/12.fasta.READS/matam_interleaved --ref_db /data/msb/tools/matam/DATABASE/SILVA_128
_SSURef_NR95 --input_fastx /gpfs1/data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins_reads_for_emirge/dastool/12.fasta.READS/12.fasta.READS._MATAM.fq 
INFO - === Input ===
INFO - Input file: /gpfs1/data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins_reads_for_emirge/dastool/12.fasta.READS/12.fasta.READS._MATAM.fq
INFO - Input file reads nb: 1885668 reads
INFO - === Reads mapping against ref db ===

  Program:     SortMeRNA version 2.1b, 03/03/2016
  Copyright:   2012-16 Bonsai Bioinformatics Research Group:
               LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
               2014-16 Knight Lab:
               Department of Pediatrics, UCSD, La Jolla,
  Disclaimer:  SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
               implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
               See the GNU Lesser General Public License for more details.
  Contact:     Evguenia Kopylova, jenya.kopylov@gmail.com 
               Laurent Noé, laurent.noe@lifl.fr
               Hélène Touzet, helene.touzet@lifl.fr

  Computing read file statistics ... done [3.08 sec]
  size of reads file: 505359024 bytes
  partial section(s) to be executed: 1 of size 505359024 bytes 
  Parameters summary:
    Number of seeds = 2
    Edges = 4 (as integer)
    SW match = 2
    SW mismatch = -3
    SW gap open penalty = 5
    SW gap extend penalty = 2
    SW ambiguous nINFO - Reads mapping completed in 108.8219 seconds wall time
INFO - Identified as marker: 2 / 1885668 reads (0.00%)
INFO - === Alignment filtering ===
INFO - Good alignments filtering completed in 0.7026 seconds wall time
INFO - === Overlap-graph building ===

PARAM: References:          /data/msb/tools/matam/DATABASE/SILVA_128_SSURef_NR95.clustered.fasta
PARAM: Sam file:            /gpfs1/data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins_reads_for_emirge/dastool/12.fasta.READS/matam_interleaved/workdir/12.fasta.READS._MATAM.sortmern
a_vs_SILVA_128_SSURef_NR95_b10_m10.scr_filt_geo_90pct.sam
PARAM: Output basename:     /gpfs1/data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins_reads_for_emirge/dastool/12.fasta.READS/matam_interleaved/workdir/12.fasta.READS._MATAM.sortmern
a_vs_SILVA_128_SSURef_NR95_b10_m10.scr_filt_geo_90pct.ovgb_i100_o50
PARAM: ASQG output:         0
PARAM: CSV output:          1
PARAM: Min Overlap:         50
PARAM: Id Threshold:        1
PARAM: NoIndel:             0
PARAM: Debug:               0
PARAM: Verbose:             1
PARAM: Test:                0

TIME: Reference fasta file read in 0.85 seconds.
INFO: 76956 reference sequences were loaded

TIME: References names loaded from the SAM file in 0 seconds.
INFO: 1 references are present in the SAM file

TIME: SAM file reading finished in 0 seconds.
INFO: 2 bam records were read, representing 2 reads
INFO: 2 bam record were mapped on a reference, representing 2 mapped reads

CRITICAL - The last command returns a non-zero return code: -8

My input is:

$ head -n 15 12.fasta.READS._MATAM.fq 
@r52556508:fw
TCAGGCGGGCGGTGAGCCGCTGGTACTCCTCGTTTTCTTCCAGCGCCGCCTCGTCTTTCTCCGACTCTAACAGCGCCTTCTTCGACGCGAGGGAAAACAGGTCTTGGATACCCACGTCGTAGGCG
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@r52556508:bw
AAACCCGTCTCGCGCGAAGCGCTGACCGAGACCGTCTCTAACCTCATCTTGCGAAACGCCTACGACGTGGGTATCCAAGACCTGTTTTCCCTCGCGTCGAAGAAGGCGCTGTTAGAGTCGGAGAA
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@r52215988:fw
GACAACCTCGCCGCCGCGGAGGACGACGACGGCGCTCAGCGCGCGGTCGTCCAGAAACGACTGGTCGATACCGGCGACGAGCGGCGGCGAGTCGGCGTCGGCGTCAAGCGCCGTCGCCTCCGCGA
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@r52215988:bw
CGGCCGTCTCGCTCGCGTCCCCCGAGACGGAGCACCAGACGCTCGCGGAGGCGACGGCGCTTGACGCCGACGCCGACTCGCCGCCGCTCGTCGCCGGTATCGACCAGTCGTTTCTGGACGACCGC
+

The input data is simulated. One can see by the qvalues that are all the same.

loic-couderc commented 6 years ago

Hi @rotoscan,

The Overlap-graph building step failed because there is not enough reads that pass the first step of MATAM (i.e there is only 2 reads in your fastq files coming from 16S rRNA).

INFO - Identified as marker: 2 / 1885668 reads (0.00%)

With metagenomics data, we expect about 1% of all reads to be 16S rRNA. If you want to test MATAM with a simulated dataset. You need to take this into account.

rotoscan commented 6 years ago

Hi @loic-couderc, thank you very much for your explanation. It makes sense. I will take this into account!

loic-couderc commented 6 years ago

Glad to be of help!