bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics
GNU Affero General Public License v3.0
19 stars 9 forks source link

Error in matam_assembly on Docker when running example file #63

Open rachelleLim opened 6 years ago

rachelleLim commented 6 years ago

Hi, Thanks so much for this program, it seems really cool! :) My working computer is a Mac, so I've been using Docker to run matam. I use an interactive session of matam as follows (docker run -it bonsaiteam/matam) and then run the following code: _matam_assembly.py -i examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp50x.fq I then get the following error: INFO - === MATAM assembly === INFO - CMD: /matam/scripts/matam_assembly.py --cpu 1 --max_memory 10000 --best 10 --evalue 1.00e-05 --score_threshold 0.90 --coverage_threshold 0 --min_identity 1.00 --min_overlap_length 50 --min_read_node 1 --min_overlap_edge 1 --quorum 0.51 --read_correction auto --contig_coverage_threshold 20 --min_scaffold_length 500 --out_dir /matam/matam_assembly --ref_db /matam/db/SILVA_128_SSURef_NR95 --input_fastx /matam/examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp_50x.fq INFO - === Input === INFO - Input file: /matam/examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp_50x.fq INFO - Input file reads nb: 11650 reads INFO - === Reads mapping against ref db === CRITICAL - The last command returns a non-zero return code: 1 Non-zero return code

Would it be possible to get a fix on this? We have no linux computers in my lab and this seems really useful! Thank you!

Additional info: Docker version: Docker version 18.03.1-ce, build 9ee9f40 Matam Image: bonsaiteam/matam latest 75143b82cd20 5 months ago 4.02GB

loic-couderc commented 6 years ago

Hi @rachelleLim,

Sorry for the poor error message. Generally, we advise to run MATAM with the verbose option: -v.

The error you encountered arise because the SSU rRNA reference database is missing. You have to get the reference database before running MATAM with the following commands:

DBDIR=/matam/db
# retrieve & index the database
index_default_ssu_rrna_db.py -d $DBDIR --max_memory 10000
# run MATAM on the default db
matam_assembly.py -d $DBDIR/SILVA_128_SSURef_NR95 -i examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp_50x.fq --cpu 4 --max_memory 10000 -v

Thank you for your interest.

rachelleLim commented 6 years ago

Dang, that makes sense haha!! Thank you so much for the quick response and clarification!! :D

rachelleLim commented 6 years ago

Hi Loic, Sorry to respond again but I ran the code as you suggested and index_default_ssu_rrna_db.py completed successfully

2018-07-11 16:21:56,408 - INFO - -- Completed default SSU rRNA DB indexing -- 2018-07-11 16:21:56,457 - DEBUG - Indexing completed in 5192.87 seconds 2018-07-11 16:21:56,459 - INFO - Indexing went well. Default SSU rRNA DB and its indexes can be found in: /matam/db/SILVA_128_SSURef_NR95*

However I've now run into a second error running the example dataset (matam_assembly.py -d $DBDIR/SILVA_128_SSURef_NR95 -i examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp_50x.fq --cpu 4 --max_memory 10000 -v): ERROR: The index '/matam/db/SILVA_128_SSURef_NR95.complete.stats' does not exist. Make sure you have constructed your index using the command indexdb'. Seeindexdb -h' for help.

The command indexed doesn't seem to exist.....any insights? Sorry for the hassle, I really appreciate your prompt responses!

loic-couderc commented 6 years ago

Hi @rachelleLim,

Currently, I’m not able to reproduce your error as no error shows up for me. Some how, I’m suspecting the indexing step to failed even if MATAM claims the contrary. Could you past the output of the indexing command to see what happens?

At the end of this step, the following files must be present in your $DBDIR:

root@cbe763ba4bfd:/matam# ls -ot $DBDIR
total 8748548
-rw-r--r-- 1 root   24072998 Jul 12 11:06 SILVA_128_SSURef_NR95.complete.fasta.fai
-rw-r--r-- 1 root    1808652 Jul 12 10:02 SILVA_128_SSURef_NR95.clustered.stats
-rw-r--r-- 1 root  936894096 Jul 12 10:02 SILVA_128_SSURef_NR95.clustered.pos_0.dat
-rw-r--r-- 1 root  432201520 Jul 12 10:02 SILVA_128_SSURef_NR95.clustered.bursttrie_0.dat
-rw-r--r-- 1 root    1048576 Jul 12 10:02 SILVA_128_SSURef_NR95.clustered.kmer_0.dat
-rw-r--r-- 1 root   15120477 Jul 12 09:51 SILVA_128_SSURef_NR95.complete.stats
-rw-r--r-- 1 root 5295589644 Jul 12 09:50 SILVA_128_SSURef_NR95.complete.pos_0.dat
-rw-r--r-- 1 root  895089300 Jul 12 09:48 SILVA_128_SSURef_NR95.complete.bursttrie_0.dat
-rw-r--r-- 1 root    1048576 Jul 12 09:48 SILVA_128_SSURef_NR95.complete.kmer_0.dat
-rw-r--r-- 1 root  140567671 Jan 18  2017 SILVA_128_SSURef_NR95.tar.bz2
-rw-r--r-- 1 1000   79032798 Jan 18  2017 SILVA_128_SSURef_NR95.complete.taxo.tab
-rw-r--r-- 1 1000 1019378193 Jan 18  2017 SILVA_128_SSURef_NR95.complete.fasta
-rw-r--r-- 1 1000  116607096 Jan 18  2017 SILVA_128_SSURef_NR95.clustered.fasta
triplem90manas commented 2 years ago

I always get this error whenever I do matam assembly.py ./matam_assembly.py -d DBDIR/SILVA_128_SSURef_NR95 -i /home/nada/matam-master/nohost1.fastq --cpu 4 --max_memory 10000 -v --perform_taxonomic_assignment "No valid binary found for componentsearch" please help