biobakery / phylophlan

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes
https://huttenhower.sph.harvard.edu/phylophlan
MIT License
128 stars 33 forks source link

Where are the bootstrap values in RAxML trees? #70

Open liuyuwen1 opened 3 years ago

liuyuwen1 commented 3 years ago

Dear developers, From other's issue I saw that you have said The .tre output is the one generated by FastTree as is the first tree built by PhyloPhlAn (I'm assuming your config file contains both sections [tree1] and [tree2], with FastTree specified in [tree1] and RAxML in [tree2]). So the final phylogeny you should consider is the RAxMLbestTree.... And here are my configuration file script

phylophlan_write_config_file \
-o bartonella2_config.cfg \
-d a \
--force_nucleotides \
--db_aa diamond \
--map_aa diamond \
--map_dna diamond \
--msa mafft \
--trim trimal \
--tree1 fasttree \
--tree2 raxml

In my output file, I can find bartonella.tre, bartonella_resolved.tre, RAxML_bestTree.bartonella_refined.tre, RAxML_info.bartonella_refined.tre, RAxML_log.bartonella_refined.tre, RAxML_result.bartonella_refined.tree. And I know that the final phylogeny I should consider is the RAxML_bestTree.bartonella_refined.tre.

But I have noticed that there aren't bootstrap values in RAxML_bestTree.bartonella_refined.tre. while bartonella.treand bartonella_resolved.tre have. And I need to show the bootstrap values in my final tree. I don't know that if there are bootstrap values in the RAxML tree and Where can I find the bootstrap values ?

Many thanks for your help, Best regards, Liu yuwen

Sidduppal commented 3 years ago

PhyloPhlan does not generate bootstrap values by itself, you'll have to use the alignment file generated by PhyPhlan to create your tree with bootstrap values. An example command might be: raxmlHPC-PTHREADS-SSE3 -s <your alignment file> -n <your output file> -f a -m PROTCATLG -# 100 -p 1989 -x 1989 -T 20 You can also specify these parameters in the config file.

acvill commented 1 year ago

I came across this thread while trying to add bootstrap annotations onto the RAxML_bestTree file generated as part of the StrainPhlAn workflow. After specifying -f a to perform a rapid bootstrap analysis using the existing alignments, you need to then run RAxML again with ­-f b to append the bipartition information onto the tree. An example workflow:

raxmlHPC-PTHREADS-SSE3 -s ${strain}.StrainPhlAn4_concatenated.aln -n bs100 -f a -m PROTCATLG -# 100 -p 1989 -x 1989 -T 20  
raxmlHPC-PTHREADS-SSE3 -f b -t RAxML_bestTree.${strain}.StrainPhlAn4.tre -z RAxML_bootstrap.bs100 -m PROTCATLG -n BestBoot

The tree with bootstrap values at node labels can then be read and visualized in R.

library(treeio)
library(ggtree)
treeb <- read.raxml(file = "RAxML_bipartitionsBranchLabels.BestBoot")
ggtree(treeb, layout = 'rectangular') +
  geom_tiplab(mapping = aes(label = label)) +
  geom_nodelab(mapping = aes(label = bootstrap))

I realize that there's probably a way to generate a tree with bootstrap annotations as part of the strainphlan run by modifying the default configs and specifying --phylophlan_configuration when strainphlan is called. But I couldn't edit the configs without creating errors. It would be great if the maintainers could provide a working config file to use as part of phylophlan/strainphlan workflows to create bootstrapped RAxML trees.