emmanuelparadis / ape

analysis of phylogenetics and evolution
http://ape-package.ird.fr/
GNU General Public License v2.0
52 stars 11 forks source link

Short question: how to interpret NAs in boot.phylo() bootstrapping? #125

Open philoel opened 1 month ago

philoel commented 1 month ago

Hi! Thanks for the great package, I use parts of it routinely without issue.

I have a question about how to interpret the NAs sometimes outputted in boot.phylo() after using nj(dist(data_matrix)) as the method to estimate the tree. I usually convert the bootstraps values to a percentage of bootstraps (so, 800 from 1000 bootstraps becomes 80%), and then create a little colour code at each node in my trees; black dots at a node represent a bootstrap value with better than 80% support; red dots have lower than 50%, and orange dots somewhere in the middle. I attach an example figure using this. You should note the nodes that have no dot on them were nodes that received a NA in the output of boot.phylo().

unrooted_tree_B=10000_Bootstrap

Sometimes, I get NA values and I'm not sure how to interpret them. These typically occur at branch points that I would predict are recovered in 100% of bootstraps, because they separate clades that are very different from each other. So for myself, I am uncertain of how to interpret the support for these nodes. I know it can't represent 100% or 0% support, because I get 0s and max values in the bootstrap values as well.

How shall I interpret these NAs? If I need to explain what it means that there is no information about boostrap support for a certain node, is there some accepted explanation or verbiage for it? I don't see anything in the manual after a few control-F, and other searching around the net hasn't turned up much.

If it helps, this is made from a matrix of transcript counts per gene for celltypes, averaged from single cell RNAseq data, and I simply use midpoint.root() to root the tree before running boot.phylo(). I'm using APE version 5.7-1 on R version 4.2.2, but I've had this phenomenon in my trees for years and across many versions.

Thanks for for your time! I'd be grateful for any tips or advice. I'm happy to provide any additional info you might like.

emmanuelparadis commented 1 month ago

Hi, Thanks for the appreciation :)

boot.phylo() can be used assuming that the trees are either rooted or unrooted. The support values are not calculated in the same way in each case: with rooted trees the clades are counted whereas with unrooted trees the bipartitions (aka splits) are counted. Your case (NJ) is, od course, the second one.

A binary tree with n tips has n - 1 clades (= number of nodes) and n - 2 splits (= number of internal branches).

boot.phylo() always returns the support values indexed to the nodes, so with unrooted trees there is an extra value which is set to NA. It is common to display the support values on the nodes but they should be on the internal branches: see the function ?drawSupportOnEdges and its examples.

See also the help page ?root and the option edgelabel of this function. The help page gives this reference for more details:

Czech et al. (2017) A critical review on the use of support values in tree viewers and bioinformatics toolkits. Molecular Biology and Evolution 10.1093/molbev/msx055

Best,

Emmanuel