biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

e] both db_dna and db_aa are None! #44

Closed Rob-murphys closed 3 years ago

Rob-murphys commented 4 years ago

I am getting the error:

e] both db_dna and db_aa are None!

My config file looks like:

[db_dna]
program_name = /home/lamma/miniconda3/envs/phylogeny/bin/makeblastdb
params = -parse_seqids -dbtype nucl
input = -in
output = -out
version = -version
command_line = #program_name# #params# #input# #output#

[map_dna]
program_name = /home/lamma/miniconda3/envs/phylogeny/bin/blastn
params = -outfmt 6 -max_target_seqs 1000000
input = -query
database = -db
output = -out
version = -version
command_line = #program_name# #params# #input# #database# #output#

[msa]
program_name = /home/lamma/miniconda3/envs/phylogeny/bin/mafft
params = --quiet --anysymbol --thread 1 --auto
version = --version
command_line = #program_name# #params# #input# > #output#
environment = TMPDIR=/tmp

[tree1]
program_name = /home/lamma/miniconda3/envs/phylogeny/bin/raxmlHPC-PTHREADS-SSE3
params = -p 1989 -m GTRCAT
input = -s
output_path = -w
output = -n
version = -v
command_line = #program_name# #params# #threads# #output_path# #input# #output#
threads = -T

The script calling phylophlan looks like:

phylophlan -i $path -d phylophlan --diversity low -f phylophlan_custom_config.cfg

My config file is defining db_dna so I am unsure why this error is occurring

fasnicar commented 4 years ago

Hi, the problem is that your config file specifies the [db_dna] section used for building databases made of genes, while your command line specifies to use the PhyloPhlAn database (-d phylophlan) which is a collection of 400 universal proteins. So, in this case, you need the [db_aa] section in your config file. Also, the config should be updated to use a tool like Diamond, able to do translated searches to map the proteins in the database against your input genomes.

If your inputs are only genomes and would like to build the phylogeny using nucleotides instead of converting them to amino acids (give that you are using a protein database) you should specify the --force_nucleotides param both when generating the config file and when running PhyloPhlAn.

Many thanks, Francesco

sigallev commented 3 years ago

I am getting the same error of "[e] both db_dna and db_aa are None" and --force_nucleotides is not solving it:

phylophlan -i inp -d phylophlan -f supermatrix_nt.cfg --diversity high --fast -o tmp_output --nproc 16 --verbose --genome_extension .fa --force_nucleotides

on

[db_dna] program_name = ncbi-blast-2.11.0+/bin/makeblastdb params = -parse_seqids -dbtype nucl input = -in output = -out version = -version command_line = #program_name# #params# #input# #output#

[map_dna] program_name = ncbi-blast-2.11.0+/bin/blastn params = -outfmt 6 -evalue 0.1 -max_target_seqs 1000000 -perc_identity 75 input = -query database = -db output = -out version = -version command_line = #program_name# #params# #input# #database# #output#

[msa] program_name = mafft/mafftdir/bin/mafft params = --quiet --anysymbol --thread 1 --auto version = --version command_line = #program_name# #params# #input# > #output# environment = TMPDIR=/tmp

[trim] program_name = trimal-trimAl/source/trimal params = -gappyout input = -in output = -out version = --version command_line = #program_name# #params# #input# #output#

[tree1] program_name =FastTreeMP params = -quiet -pseudo -spr 4 -mlacc 2 -slownni -fastest -no2nd -mlnni 4 -gtr -nt output = -out command_line = #program_name# #params# #output# #input# environment = OMP_NUM_THREADS=3

fasnicar commented 3 years ago

Hi @sigallev, in your case you're using the phylophlan database (which is a set of 400 universal proteins) with a configuration file tailored for a database of genes. I guess you also have the supermatrix_aa.cfg configuration file, which should work fine with the command you provided above.

Many thanks, Francesco

sigallev commented 3 years ago

So use this config instead? I will update my diamond version and try.

[db_aa]
program_name = /bin/diamond
params = makedb
threads = --threads
input = --in
output = --db
version = version
command_line = #program_name# #params# #threads# #input# #output#

[map_dna]
program_name = /bin/diamond
params = blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
input = --query
database = --db
output = --out
version = version
command_line = #program_name# #params# #input# #database# #output#

[map_aa]
program_name = /bin/diamond
params = blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
input = --query
database = --db
output = --out
version = version
command_line = #program_name# #params# #input# #database# #output#

[msa]
program_name = mafft/mafftdir/bin/mafft
params = --quiet --anysymbol --thread 1 --auto
version = --version
command_line = #program_name# #params# #input# > #output#
environment = TMPDIR=/tmp

[trim]
program_name =trimal-trimAl/source/trimal
params = -gappyout
input = -in
output = -out
version = --version
command_line = #program_name# #params# #input# #output#

[tree1]
program_name = FastTreeMP
params = -quiet -pseudo -spr 4 -mlacc 2 -slownni -fastest -no2nd -mlnni 4 -lg
output = -out
command_line = #program_name# #params# #output# #input#
environment = OMP_NUM_THREADS=3