YuLab-SMU / ggtree

:christmas_tree:Visualization and annotation of phylogenetic trees
https://yulab-smu.top/contribution-tree-data/
838 stars 173 forks source link

Re-rooting tree with correct branch support labels #89

Open toddknutson opened 7 years ago

toddknutson commented 7 years ago

(Update: 2016-11-11 8:35 AM CST): I added images of the trees I was talking about.

Hi,

I found this recent article from the authors of RAxML, that reviews many tree viewing software. A Critical Review on the Use of Support Values in Tree Viewers and Bioinformatics Toolkits. In the review they evaluate the root() function in APE and the developers of APE have updated the software to correctly display support values (when originally stored as node labels) in version 3.5-0.10 (and likely a future version 3.6).

My question is, how do I re-root a tree using ggtree methods? Do I use the "ape" package root() function to re-root a tree, then plot with ggtree? Currently, CRAN only has ape v3.5 available. How exactly can this be done if I imported the "RaxML_bipartitionsBranchLabels_example.nex" file using the read.raxml() function?

I tried directly re-assigning the phylo object within the raxml tree object: raxml_tree@phylo <- root(raxml_tree@phylo, node = 10) But that creates a tree with incorrect support values.

Here is my code:

library(ggplot2)
library(ggtree)
library(ape)
# Download and Import an example RAXML tree (same tree as in above publication)
download.file("https://gist.githubusercontent.com/toddknutson/0403a5be461c9560f177307dacbb8f39/raw/59cdd7de7ded28d9421af141c93c1089282552b3/RaxML_bipartitionsBranchLabels_examle.nex",destfile="RaxML_bipartitionsBranchLabels_examle.nex",method="libcurl")
raxml_tree <- read.raxml("RaxML_bipartitionsBranchLabels_examle.nex")

ggtree(raxml_tree) + geom_tiplab() + geom_text2(aes(label = node)) + geom_rootpoint()
ggtree(raxml_tree) + geom_tiplab() + geom_label2(aes(label = bootstrap)) + geom_rootpoint()

# Create a copy of the original raxml tree
raxml_tree_rooted <- raxml_tree
# re-root the new tree using the "X" labeled branch. 
# root() function from ape package
raxml_tree_rooted@phylo <- root(raxml_tree_rooted@phylo, outgroup = "X")

# Plot tree
ggtree(raxml_tree_rooted) + geom_tiplab() + geom_label2(aes(label = bootstrap)) + geom_rootpoint()
# This tree, has the wrong support values shown at the nodes.

Original tree, before any rerooting (with support values): Original tree

Re-rooted tree, with incorrect support labels: Re-rooted tree, with incorrect support labels

I would appreciate any advice on how to re-root phylogenetic trees that show accurate support values after re-rooting using ggtree. FYI, According to the manuscript listed above, iTOL will correctly label the branchs with support values after re-rooting. The tree I used in iTOL is: "((C,D)1,(A,(B,X)3)2,E);R".

iTOL tree, re-rooted, displaying correct support labels: iTOL tree, re-rooted, displaying correct support labels

Thanks so much for a great R package. I really like it!

toddknutson commented 7 years ago

A shorter version of my question: How can you re-root a tree after imported using the read.raxml() function?

slhogle commented 7 years ago

i have the same question... would be nice to have an option built into the ggtree framework that will quickly reroot (or midpoint root) a tree

toddknutson commented 7 years ago

I think I have a partial solution. The recently updated R package ape ver 4.0 included a new option in their root() function: edgelabel = TRUE. If you have bootstrapping support values included in the node labels, after re-rooting a tree with root(edgelabel = TRUE), the bootstrapping support values will be assigned to the correct branches (edges) of the tree. This is a major improvement.

Therefore, if you simply use RaxML_bipartitions.nex file that lists the bootstrapping support values as node labels and read that tree using a standard method read.tree(), you should be able to re-root the tree using the ape ver 4 package, root(tree, node = X, edgelabel = TRUE) function. This method would avoid using the read.raxml() function completely.

If you import a tree using tree <- read.tree(), root it wherever you like with tree2 <- root(tree, node = X, edgelabel = TRUE), plot the tree with p <- ggtree(tree2), then you can add additional metadata using the %<+% method, and the metadata will be applied correctly: p2 <- p %<+% meta.data.frame + geom_tiplabels().

However, there is still one problem. How can I root the tree to the midpoint? I don't think that can be done using the ape root() function, but I might be wrong?? The other common midpoint functions (midpoint() and midpoint.root()) fail to assign the bootstrapping values to the correct edges after midpoint rooting .

toddknutson commented 7 years ago

I'm sorry, but I was wrong about the midpoint() function in the phangorn R package. This function works great. By default, it uses the option node.labels = "support", which is exactly what I am supplying with the standard tree (i.e. node labels with support values). Thus, using the updated root(edgelabel = TRUE) function in ape ver 4 or the midpoint() function in the phangorn package, provides accurate support values assigned to the branches when plotted with ggtree.

Therefore, my workflow is the same as in my above comment. (ignore the bold statement, which is wrong).

Thanks again, you can close this issue.

GuangchuangYu commented 7 years ago

the information attached to the node may not make sense after re-rooting. This is why I didn't implement midpoint method for S4 tree object.

see also the discussion in https://github.com/KlausVigo/phangorn/issues/5.

toddknutson commented 7 years ago

Hi Guangchuang,

Thanks for the link to your discussion with Klaus. This discussion and your comment above clears things up for me. I think if I take a tree with support values as node labels, root or midroot the tree using root() ape v4 or midroot() functions, plot using ggtree(), and then add related tip information to the tree (via %<+%). My "additional information" is primarily associated with the tips, which gets added faithfully to the plot. I could see how others might want to add information associated with the nodes, but you're correct, after rooting, that info is hard to map to the correct nodes.

Hopefully, in the future there could be some way of re-rooting the S4 tree object (via read.raxml()), but I think the current workflow is fine for me. Thanks a lot for ggtree!

GuangchuangYu commented 7 years ago

I try to split ggtree into two packages, treeio and of course ggtree, which will focus on visualization.

Maybe I will implement midroot() function for S4 tree object in future.

michaelgruenstaeudl commented 7 years ago

Hi Todd, hi Guangchuang, I, too, have come across the issue that parsing a tree via treeio::read.raxml and then re-rooting it via ape::root messes up the node-to-nodevalue association. This is particularly unfortunate as re-rooting phylogenetic trees constitutes a standard procedure in bioinformatics, and a correct handling of re-rooted trees would make treeio even more useful.

While I don't have a perfect solution to this issue, I would like to offer a (half-baked) workaround. It appears that the column bootstrap of the internal tree data (i.e., raxml_tree_rooted@data$bootstrap in Todd's example) is shifted by one row. Shifting these values back by one row can be done as follows:

copy_BSvals = raxml_tree_rooted@data$bootstrap
new_BSvals = c(NA, copy_BSvals[length(copy_BSvals)], copy_BSvals[2:(length(copy_BSvals)-1)])
raxml_tree_rooted@data$bootstrap = new_BSvals
ggtree(raxml_tree_rooted) + geom_tiplab() + geom_label2(aes(label = bootstrap)) + geom_rootpoint()

I hope this helps in finding a more permanent solution!

michaelgruenstaeudl commented 7 years ago

If function treeio::read.raxml is currently incompatible with ape's function root (as appears to be the case), then why don't you simply load the bipartitions output of RAxML? This file contains the same information as file bipartitionsBranchLabels, just that it can be parsed seamlessly by ape's function read.tree and, by extension, is compatible with function root. Best, Michael

toddknutson commented 7 years ago

Hi Michael,

Thanks for this tip (it took me a long time to figure that out!). This is what I am doing now. I import the bipartitions file from RAxML using the standard tree <- read.tree(RAxML_ bipartitions) function. Then I midpoint root the tree using library(phangorn) and tree_midroot <- midpoint(tree, node.labels = "support"). Finally, I add my metadata with library(ggtree) and

p1 <- ggtree(tree_midroot)
p2 <- p1 %<+% metadata

Presumably, library(ape) would also work for re-rooting the tree. I think my major problem was, I liked the idea of importing the bipartitionsBranchLabels directly with ggtree/treeio (which seems like a more accurate way to represent the data -- where the support values are in their own location). But the above method works well for me now. You just need to make sure the first column of your metadata sheet match the taxa names in your tree for the metadata to be added correctly to your tree.

slhogle commented 6 years ago

been a long time since this thread was active, but I figured I'd post another potentially useful solution...

I've been lazily rooting trees and doing various other manipulations using Archaeopteryx. You can then save the modified tree in newick format as long as you select the option "Use brackets for confidence values" in the 'options' menu. After saving the re-rooted/modifed tree you can then read the modified tree using treeio/ggtree read.raxml()

This is often the easiest method for me if I need to just quickly plot some metadata along with my tree in ggtree

brj1 commented 5 years ago

I have implemented the reroot method for the treedata class in ggtree #211 which may address this issue.

MdUmar-tech commented 6 months ago

Hi, still I am getting same issue, but when I used python , I got proper, but other work is still in R only, my file ML, and extension is .nhx Kindly suggest Thanks and Regards