Closed Rob-murphys closed 3 years ago
Hi, the problem is that your config file specifies the [db_dna]
section used for building databases made of genes, while your command line specifies to use the PhyloPhlAn database (-d phylophlan
) which is a collection of 400 universal proteins. So, in this case, you need the [db_aa]
section in your config file. Also, the config should be updated to use a tool like Diamond, able to do translated searches to map the proteins in the database against your input genomes.
If your inputs are only genomes and would like to build the phylogeny using nucleotides instead of converting them to amino acids (give that you are using a protein database) you should specify the --force_nucleotides
param both when generating the config file and when running PhyloPhlAn.
Many thanks, Francesco
I am getting the same error of "[e] both db_dna and db_aa are None" and --force_nucleotides is not solving it:
phylophlan -i inp -d phylophlan -f supermatrix_nt.cfg --diversity high --fast -o tmp_output --nproc 16 --verbose --genome_extension .fa --force_nucleotides
on
[db_dna] program_name = ncbi-blast-2.11.0+/bin/makeblastdb params = -parse_seqids -dbtype nucl input = -in output = -out version = -version command_line = #program_name# #params# #input# #output#
[map_dna] program_name = ncbi-blast-2.11.0+/bin/blastn params = -outfmt 6 -evalue 0.1 -max_target_seqs 1000000 -perc_identity 75 input = -query database = -db output = -out version = -version command_line = #program_name# #params# #input# #database# #output#
[msa] program_name = mafft/mafftdir/bin/mafft params = --quiet --anysymbol --thread 1 --auto version = --version command_line = #program_name# #params# #input# > #output# environment = TMPDIR=/tmp
[trim] program_name = trimal-trimAl/source/trimal params = -gappyout input = -in output = -out version = --version command_line = #program_name# #params# #input# #output#
[tree1] program_name =FastTreeMP params = -quiet -pseudo -spr 4 -mlacc 2 -slownni -fastest -no2nd -mlnni 4 -gtr -nt output = -out command_line = #program_name# #params# #output# #input# environment = OMP_NUM_THREADS=3
Hi @sigallev, in your case you're using the phylophlan
database (which is a set of 400 universal proteins) with a configuration file tailored for a database of genes.
I guess you also have the supermatrix_aa.cfg
configuration file, which should work fine with the command you provided above.
Many thanks, Francesco
So use this config instead? I will update my diamond version and try.
[db_aa]
program_name = /bin/diamond
params = makedb
threads = --threads
input = --in
output = --db
version = version
command_line = #program_name# #params# #threads# #input# #output#
[map_dna]
program_name = /bin/diamond
params = blastx --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
input = --query
database = --db
output = --out
version = version
command_line = #program_name# #params# #input# #database# #output#
[map_aa]
program_name = /bin/diamond
params = blastp --quiet --threads 1 --outfmt 6 --more-sensitive --id 50 --max-hsps 35 -k 0
input = --query
database = --db
output = --out
version = version
command_line = #program_name# #params# #input# #database# #output#
[msa]
program_name = mafft/mafftdir/bin/mafft
params = --quiet --anysymbol --thread 1 --auto
version = --version
command_line = #program_name# #params# #input# > #output#
environment = TMPDIR=/tmp
[trim]
program_name =trimal-trimAl/source/trimal
params = -gappyout
input = -in
output = -out
version = --version
command_line = #program_name# #params# #input# #output#
[tree1]
program_name = FastTreeMP
params = -quiet -pseudo -spr 4 -mlacc 2 -slownni -fastest -no2nd -mlnni 4 -lg
output = -out
command_line = #program_name# #params# #output# #input#
environment = OMP_NUM_THREADS=3
I am getting the error:
e] both db_dna and db_aa are None!
My config file looks like:
The script calling phylophlan looks like:
phylophlan -i $path -d phylophlan --diversity low -f phylophlan_custom_config.cfg
My config file is defining
db_dna
so I am unsure why this error is occurring