Closed cyklee closed 3 years ago
Hi, thank you very much for reporting this. It was indeed a mistake from an old commit. It is now fixed with commit f1999e6672f9b4b216a96c15e93f21b136070afe.
I also added a couple of validity checks because I noticed from your command that you are specifying both: --gene_tree1
, --gene_tree2
, --tree1
, --tree2
, all to raxml
that does not make much sense.
If you want a gene tree pipeline, then --tree1
has to be a consensus approach.
So, please check the wiki if you're in doubt about what you should specify.
Many thanks, Francesco
Hi @fasnicar, Thank you for pointing that out and the speedy fix! I'm actually trying to run the supermatrix pipeline using nucleotide input but ran into a few issues:
I'm testing out running:
phylophlan_write_config_file --db_type a --db_aa diamond --map_aa diamond --map_dna diamond --msa muscle --trim trimal --tree1 fasttree --tree2 raxml --force_nucleotides -o test5.cfg
Kind regards Kevin
Hi Kevin,
So, at the moment phylophlan_setup_database
can only download species-specific UniRef90 proteins. So, for a genus-level phylogeny, I would recommend you use the phylophlan
database. Alternatively, if you identify a set of proteins conserved at the genus-level, you can provide them to phylophlan_setup_database
to build your custom db.
The --force_nucleotides
param forces PhyloPhlAn to use the nucleotides after mapping the proteins in the db to the input genomes. You need to specify it also when you generate the configuration file because the parameters for the phylogeny inference tools should reflect it as they will use a nucleotide MSA.
The command for the config file looks fine.
Many thanks, Francesco
Hi Francesco, Thanks you for your excellent explanations and sorry for the late response. I've successfully ran inferences with both protein and nucleotide data.
With gratitude, Kevin
phylophlan_write_config_file.py version 3.0.19 (3 November 2020) using the following input results in an error:
phylophlan_write_config_file --db_type n --db_dna makeblastdb --map_dna blastn --msa muscle --trim trimal --gene_tree1 raxml --gene_tree2 raxml --tree1 raxml --tree2 raxml --force_nucleotides -o test
I believe this is due to trailing comma in ln 349:
gene_tree1['params'] += ' -m GTRCAT',
Such issue can also be found on ln 364.