bonsai-team / matam

Mapping-Assisted Targeted-Assembly for Metagenomics
GNU Affero General Public License v3.0
19 stars 9 forks source link

Failed ton install Matam from Conda package #65

Closed pgdurand closed 5 years ago

pgdurand commented 5 years ago

Following your documentation, I've installed Matam from conda repository, using "conda install matam" with appropriate channels.

Then, I've tested Matam with your sample files (section "Running example datasets" on the README.md); first comment here: these sample files ($MATAMDIR/bin/matam_assembly.py -i $MATAMDIR/examples/16sp_simulated_dataset/) are not part of the Conda package, I've had to find them by myself and download them. Could you please add these files in the Conda recipe ? Or mention in the documentation we have to get them and from which place ?

Now still using your sampe command :

$MATAMDIR/bin/matam_assembly.py -i $MATAMDIR/examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp_50x.fq --cpu 4 --max_memory 10000 -v

I'm stuck with that error :

ERROR: the file /appli/conda-env/bioinfo/matam-1.5.2/opt/matam-v1.5.2/db/SILVA_128_SSURef_NR95.clustered.fasta could not be opened: No such file or directory.

seems that Matam requires to use Silva (path is inside Matam conda package, see above message: "db" sub-folder does not exist), but I failed to locate that "SILVA_128_SSURef_NR95.clustered.fasta" file on the Silva web site.

What can I do to solve that problem.

Thanks for your help, Patrick

loic-couderc commented 5 years ago

Hi @pgdurand,

Your are right, the 16sp_simulated_dataset is not part of the conda package. I will open an issue to adress it.

As MATAM is able to run on custom database, the SILVA_128_SSURef is not part of the conda package either.

To retrieve & index the SILVA_128_SSURef database, you need to execute the following command before running MATAM: index_default_ssu_rrna_db.py -d $DBDIR --max_memory 10000

Then you can run MATAM with the -d option: matam_assembly.py -d $DBDIR/SILVA_128_SSURef_NR95 -i $MATAMDIR/examples/16sp_simulated_dataset/16sp.art_HS25_pe_100bp_50x.fq --cpu 4 --max_memory 10000 -v

pgdurand commented 5 years ago

ok, I succeeded in installing Silva as you mentioned.

Then, I ran that command:

matam_assembly.py -d $MTM_DATA/SILVA_128_SSURef_NR95 -i $MTM_DATA/sample/16sp.art_HS25_pe_100bp_50x.fq --cpu 14 --max_memory 10000 -v

Generated log by MATAM ended with this error (this is last 25 lines of the log file) :

INFO - === Overlap-graph building ===

PARAM: References:          /appli/bioinfo/matam/data/silva/SILVA_128_SSURef_NR95.clustered.fasta
PARAM: Sam file:            /home1/datahome/galaxy/matam_assembly/workdir/16sp.art_HS25_pe_100bp_50x.sortmerna_vs_SILVA_128_SSURef_NR95_b10_m10.scr_filt_geo_90pct.sam
PARAM: Output basename:     /home1/datahome/galaxy/matam_assembly/workdir/16sp.art_HS25_pe_100bp_50x.sortmerna_vs_SILVA_128_SSURef_NR95_b10_m10.scr_filt_geo_90pct.ovgb_i100_o50
PARAM: ASQG output:         0
PARAM: CSV output:          1
PARAM: Min Overlap:         50
PARAM: Id Threshold:        1
PARAM: NoIndel:             0
PARAM: Debug:               0
PARAM: Verbose:             1
PARAM: Test:                0

TIME: Reference fasta file read in 0.777243 seconds.
INFO: 76956 reference sequences were loaded

TIME: References names loaded from the SAM file in 0.000102 seconds.
INFO: 0 references are present in the SAM file

TIME: SAM file reading finished in 4.9e-05 seconds.
INFO: 0 bam records were read, representing 0 reads
INFO: 0 bam record were mapped on a reference, representing 0 mapped reads

CRITICAL - The last command returns a non-zero return code: -11
loic-couderc commented 5 years ago

To be able to help you, I need more informations, could you please run MATAM in debug mode and join the log file?

matam_assembly.py -d $MTM_DATA/SILVA_128_SSURef_NR95 -i $MTM_DATA/sample/16sp.art_HS25_pe_100bp_50x.fq --cpu 14 --max_memory 10000 -v --debug
pgdurand commented 5 years ago

here is the full log file:

matam.log

loic-couderc commented 5 years ago

Thank you.

There is a problem with your input file: Input file: /appli/bioinfo/matam/data/16sp.art_HS25_pe_100bp_50x.fq

SortMeRNA is not able to read the file:

The input reads file or reference file is empty, or the reads file is not in FASTA or FASTQ format, no analysis could be made.

Could you check your file? Maybe an error occurred during downloading this file?

pgdurand commented 5 years ago

ok, fixed.

Samples files were indeed corrupted, I didn't notice that.

Thanks for your help.