biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

db_dna #6

Closed ganiatgithub closed 4 years ago

ganiatgithub commented 4 years ago

Hi,

I'm running phylophlan3 on a set of MAGs together with reference genomes. I was wondering if you could help me trouble shoot regarding error [e] both db_dna and db_aa are None!

My command is:

phylophlan \ --input_folder ./fna \ -o ./out \ --nproc 48 \ --diversity high \ -d phylophlan \ -f /home/Staff/uqgni1/miniconda2/envs/pp3/lib/python3.7/site-packages/phylophlan/phylophlan_configs/default_nt.cfg \ --configs_folder /home/Staff/uqgni1/miniconda2/envs/pp3/lib/python3.7/site-packages/phylophlan/phylophlan_configs/ \ --submat_folder /home/Staff/uqgni1/miniconda2/envs/pp3/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_matrices \ --maas /home/Staff/uqgni1/miniconda2/envs/pp3/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/phylophlan.tsv \ -i reducedtree \ --force_nucleotides

The config file is:

[db_dna] program_name = makeblastdb params = -parse_seqids -dbtype nucl input = -in output = -out version = -version command_line = #program_name# #params# #input# #output# [map_dna] program_name = blastn params = -outfmt 6 -max_target_seqs 1000000 input = -query database = -db output = -out version = -version command_line = #program_name# #params# #input# #database# #output# [msa] program_name = muscle params = -quiet -maxiters 2 input = -in output = -out version = -version command_line = #program_name# #params# #input# #output# [tree1] program_name = iqtree params = -quiet -nt AUTO -m GTR input = -s output = -pre version = -version command_line = #program_name# #params# #input# #output#

So I have defined db_dna in the config, how come the software cannot find it? PS. I used conda installation and want to use the phylophlan 400 proteins.

Many thanks

fasnicar commented 4 years ago

Hello Gani,

The problem is that you're specifying a config file for a nucleotide database (-f /home/Staff/uqgni1/miniconda2/envs/pp3/lib/python3.7/site-packages/phylophlan/phylophlan_configs/default_nt.cfg, which you can shorten with -f default_nt.cfg being that an internal PhyloPhlAn folder), while using a protein database -d phylophlan.

When installing PhyloPhlAn you should have 4 config files available, trying with -f supermatrix_aa.cfg.

Many thanks, Francesco

cosicamar commented 4 years ago

Hi Fasnicar, I am jumping here also just to "report" same problem. "[e] both db_dna and db_aa are None!"

But also couple of notes that may help you. Input: phylophlan -i FASTA --diversity low -d amphora2 -f supertree_nt.cfg with "FASTA" being the folder where all my genomes are contained

and I am using the general default configuration file, with just a small modification where I am using astral version 5.7.3 (had to modify that in the cfg file).

fasnicar commented 4 years ago

Hi cosicamar,

As per my answer above. You're using the Amphora2 set of universal proteins with a configuration file tailored for a nucleotide database. Try using supertree_aa.cfg updated with the modifications you need for astral.

Many thanks, Francesco

ganiatgithub commented 4 years ago

Hi Francesco,

This is now solved with supermatrix config file, thank you and sorry it took a while.

All the best! Gaofeng