YuLab-SMU / ggtree

:christmas_tree:Visualization and annotation of phylogenetic trees
https://yulab-smu.top/contribution-tree-data/
839 stars 173 forks source link

viewClade(): Issues when applied to phenogram and "fan" layout #299

Open AEgit opened 4 years ago

AEgit commented 4 years ago

Hi,

I would like to plot a subclade of a large phylogenetic tree using ggtree. The viewClade() function is supposed to do the job, but it gives unexpected results when applied to a phenogram style ggtree object and it does not work at all when a "fan" layout is selected.

When plotting a phenogram and selecting a subclade with viewclade(), other clades are plotted as well, when they fall into the same trait space. Is there a way to not plot these clades except for using the collapse() function?

When plotting a tree with the "fan" layout viewClade() does not seem to work at all.

tree_subset() does not seem to be the same, because that function actually prunes the tree, making it necessary to generate the respective ggtree plot from scratch. But I would like to re-use the original plot, but just focus on a subclade. Is that possible? The gzoom() function is also deprecated, so that does not work as well.

Minimum code example follows below (basically directly taken from https://yulab-smu.github.io/treedata-book/chapter4.html):

library(phytools)
library(ggplot2)
library(ggtree)
library(treeio)

# #ViewClade gives unexpected results when plotting a phenogram
anole.tree<-read.tree("http://www.phytools.org/eqg2015/data/anole.tre")
svl <- read.csv("http://www.phytools.org/eqg2015/data/svl.csv",
                row.names=1)
svl <- as.matrix(svl)[,1]
fit <- phytools::fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE)

td <- data.frame(node = nodeid(anole.tree, names(svl)),
                 trait = svl)
nd <- data.frame(node = names(fit$ace), trait = fit$ace)

d <- rbind(td, nd)
d$node <- as.numeric(d$node)
tree <- full_join(anole.tree, d, by = 'node')

p <- ggtree(tree, aes(color=trait), continuous = TRUE, yscale = "trait") + 
  scale_color_viridis_c() + theme_minimal()
p

viewClade(p, node = MRCA(p, "confusus", "ahli")) # #Note, that other clades are plotted as well, if they
# #fall into a similar trait space

# #viewClade does not work with "fan" layout at the moment
p2 <- ggtree(tree, aes(color=trait), layout="fan") + 
  scale_color_viridis_c()# + theme_minimal()
p2

viewClade(p2, node = MRCA(p, "confusus", "ahli"))
AEgit commented 4 years ago

Here I provide an additional example (again with a minimum code example) including the plots to visualise the problem:

library(phytools)
library(ggplot2)
library(ggtree)
library(treeio)

# #ViewClade gives unexpected results when plotting a phenogram
anole.tree<-read.tree("http://www.phytools.org/eqg2015/data/anole.tre")
svl <- read.csv("http://www.phytools.org/eqg2015/data/svl.csv",
                row.names=1)
svl <- as.matrix(svl)[,1]
fit <- phytools::fastAnc(anole.tree,svl,vars=TRUE,CI=TRUE)

td <- data.frame(node = nodeid(anole.tree, names(svl)),
                 trait = svl)
nd <- data.frame(node = names(fit$ace), trait = fit$ace)

d <- rbind(td, nd)
d$node <- as.numeric(d$node)
tree <- groupClade(anole.tree, .node = MRCA(anole.tree, "confusus", "ahli"))
tree <- full_join(tree, d, by = 'node')

p <- ggtree(tree, aes(color=group), yscale = "trait") + 
  geom_tiplab(size = 1) +
  theme_minimal()
p

viewClade(p, node = MRCA(p, "confusus", "ahli")) # #Note, that other clades are plotted as well, if they
# #fall into a similar trait space

# #viewClade does not work with "fan" layout at the moment
p2 <- ggtree(tree, aes(color=trait), layout="fan") + 
  scale_color_viridis_c()# + theme_minimal()
p2

viewClade(p2, node = MRCA(p, "confusus", "ahli"))
# #The following error message is shown:
# Coordinate system already present. Adding new coordinate system, which will replace the existing one.
# Error in if (all(is.finite(continuous_range_coord)) && diff(continuous_range_coord) <  : 
#              missing value where TRUE/FALSE needed

The plot p looks like this: Rplot

If I use viewClade() on p I get the following. Note, that not only the specified clade is shown (cyan colour), but also other clades, that fall into this trait range (red colour): Rplot01 I would like to re-use the original plot p, but just focus on a subclade. Is that possible? Basically all red branches should be hidden - only the cyan branches should be shown.

The plot p2 looks like this: Rplot02

If I use viewClade() on p2 the above mentioned error message is displayed and a white figure is generated (in older versions of ggtree, a different figure would be generated, where all branches would lie on top of each other).

brj1 commented 4 years ago

You can work from fortify to filter by the desired clade.

subtree <- fortify(tree, aes(color=group), yscale = "trait") %>% dplyr::filter(group == 1)

p <- ggtree(subtree,  layout = "slanted") + 
  geom_tiplab(size = 1) +
  theme_minimal()
p

Also, viewClade does not work with fan, circular nor unrooted tree layouts.

AEgit commented 4 years ago

Thank you very much - yes, this is a nice workaround that might be better than what I was doing (which was basically recreate the whole object, while you use the dplyr::filter() function to select the group of choice). I reckon, if you wanted to have the branches coloured according to the trait value (as in the original post), you would need something like this:

subtree <- fortify(tree, aes(color=trait), continuous = TRUE, yscale = "trait") %>% dplyr::filter(group == 1)

p <- ggtree(subtree, layout = "slanted", aes(color=trait), yscale = "trait") + 
  geom_tiplab(size = 1) +
  scale_color_viridis_c() +
  theme_minimal()
p

Ultimately, it is still just a workaround - the user still needs to generate a new ggtree object. The current behaviour of viewClade() for phenograms does not seem intended. If it is intended, then I reckon there would be demand for a new function in ggtree that ONLY plots the subclade of interest of an already existing ggtree object (i. e., which hides all other clades).

As you mention viewClade() does not work at the moment for other layouts. I reckon people would be interested to see this implemented at some point in time in ggtree.

Do you think we can expect to see this additional functionality implemented in ggtree at some point in time?