biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

Tree of life at nucleotide level #26

Closed gunturus closed 4 years ago

gunturus commented 4 years ago

Is there a way to make the tree of life using phylophan database at the nucleotide level? The default phylophan database only has a protein database from my understanding.

fasnicar commented 4 years ago

Hello, you can use the --force_nucleotides param to force PhyloPhlAn building a phylogeny based on nucleotides instead of proteins, as detailed here in the wiki. Also, you have to ri-generate the configuration file specifying the same --force_nucleotides param to make the proper configuration file for this analysis. Note that (1) this works only if your inputs are all genomes, as proteomes (if presents among your inputs) will not be included in the analysis, and (2) whether the database is made of genes or proteins, instead, won't change in this case, so you can use the phylophlan database.

If I may, my personal comment on this is not to do a tree of life using nucleotides because the alignment positions will be 3 times longer than using proteins and this affects both the MSA and the phylogeny inference steps, which will take longer and most likely much more than 3 times the time of using proteins.

gunturus commented 4 years ago

I understand that it will increase the alignment by three times. I have been using the tree of life tutorial to make a tree for only the Arthrobacter genus (can't confirm they are Arthrobacter, but the closest match in the database is Arthrobactoer) in my bins (about 30). I would like more resolution. I have looked at the S. aureus tutorial, but I was looking for a denovo method to attain the core genes instead of UniRef90. I will try both methods to see how they are different.

fasnicar commented 4 years ago

I see and I understand better now why you want to do that. Have you checked out the tutorial of phylophlan_metagenomic? Is the Arthrobacter coming from phylophlan_metagenomic? If not, you can use phylophlan_metagenomic to find the closest SGB to your bins and you can also add them to your analysis, like in the Proteobacteria tutorial.

I hope this helps