YuLab-SMU / ggtree

:christmas_tree:Visualization and annotation of phylogenetic trees
https://yulab-smu.top/contribution-tree-data/
839 stars 174 forks source link

FEATURE REQUEST: greater control over parameters for node values #13

Closed rmkepler closed 9 years ago

rmkepler commented 9 years ago

Hello,

I like the flexibility of the package and the ability to work with phylogenetic trees in a graphics oriented way. I think it could be made even better if you could control the display parameters for node labels the way ggtree allows for tip labels. This could include trees passed from other packages. I pruned taxa from a RAxML tree with drop.tip {ape} and labeled clades with ggtree. I would love to exclude values for clades receiving bootstrap values less than 70 percent (or any value...) and move these values to the middle of their branches, leaving the taxa at the tips. I am not expert at R, but I couldn't find an easy way to hack a solution to any of these issues. Still, nice package. I hope it continues to be developed.

Ryan

GuangchuangYu commented 9 years ago

So you want to display bootstrap values larger than 0.7 in one layer and another layer of bootstrap values less than 0.7 at the middle of branches, right?

rmkepler commented 9 years ago

I would like to be able to only display bootstrap values (or posterior probabilities from a MrBayes run) above a certain threshold (70 percent, or 0.95 for mrbayes analyses), and control parameters like color, size, placement etc. Some of these manipulations are available in ape, maybe they could be ported over

GuangchuangYu commented 9 years ago

subsetting values can be done via for example geom_text(aes(label=bootstrap), subset=.(bootstrap > 0.7)), but this feature will be removed in next release of ggplot2, see https://github.com/hadley/ggplot2/issues/1295.

In current devel version of ggtree, I have implemented geom_text2 that supports subsetting, eg, geom_text2(aes(label=bootstrap, subset=bootstrap>0.7)). Please notice that the grammar is different from current ggplot2 release.

If the feature is already available in ape, please post example code for me to see the effect.

rmkepler commented 9 years ago

Oh, great. I didn't see that in the documentation, but I will check in out now. This sounds far easier and more powerful than what available in ape. Thanks.

rmkepler commented 9 years ago

Guangchuang,

I know you closed this thread, but after trying the code for ggplot2 I get the following error:

ggtree(tree) + geom_text(aes(label=bootstrap), subset=.(bootstrap>70)) Error in eval(expr, envir, enclos) : object 'bootstrap' not found

GuangchuangYu commented 9 years ago

your tree doesn't contain any bootstrap information and how can you expect ggtree to visualize a feature that doesn't exist?!

try the following example, which use an output file of RAxML bootstrap analysis.

require(ggtree)
raxml_file <- system.file("extdata/RAxML", "RAxML_bipartitionsBranchLabels.H3", package="ggtree")
raxml=read.raxml(raxml_file)
ggtree(raxml) + geom_text(aes(label=bootstrap), subset=.(bootstrap>70))
rmkepler commented 9 years ago

This gets at the issue for me. I have already edited the tree, changing the name information and pruning some tips for a cleaner publication image, so the RAxML_bipartitionsBranchLabels.H3 information is not going to match the tree file. The tree has values associated with the internal nodes (so not tip labels, the ones associated with data.frame$isTip ==FALSE). I have tried to find a work around, but nothing yet.

Maybe I just need to adjust my pipeline and do the tip pruning and name changing last. But I had already done the other edits when I found your program, so I thought I would give it a try.

GuangchuangYu commented 9 years ago
so the RAxML_bipartitionsBranchLabels.H3 information is not going to match the tree file.

RAxML_bipartitionsBranchLabels.H3 is an example file in ggtree and has nothing to do with your own tree file. I don't get your point.

If you edit the tree properly without losing your bootstrap information, you can visualize your tree with bootstrap value annotated by ggtree.

rmkepler commented 9 years ago

Sorry, I must not be explaining this well. Think about it this way. Let's say you import a tree created in a program that ggtree doesn't have a native importer for (for example a parsimony tree from Mega or Paup) in newick format. Newick format allows for values associated with nodes without storing the data in another file (the information to the left of the colon).

example: ((A:2,B:2)95:2,(C:2,D:2)100:2);

When you bring a newick tree into ggtree with bootstrap values included (95 and 100, as above), these values appear in the data.frame$label column. I would like to be able to control the behavior of these values, in a way similar to how geom_tiplab() allows you to have control over just the tip labels, something like "geom_nodelab()". I would at least like to be able to control the position of where they appear on the branch. It seems like it may be difficult to sort these since they are character values, not numerical.

But I understand the fact that you can import node values for your tree as a separate element that then becomes available for manipulation. Like I said, I haven't tried it this way yet. Thanks for taking the time to hear me out on this.

GuangchuangYu commented 9 years ago

refer to the definition of newick format, the information to the left of the colon should be label and indeed it was parsed as label.

In your case, node labels were used to store bootstrap values. It's also easy to display such information using ggtree. You can use subset, as I mentioned before, to separate tip label and node label (bootstrap here).

library(ggtree)
tree=read.tree(text="((A:2,B:2)95:2,(C:2,D:2)100:2);")
ggtree(tree) + geom_text(aes(label=label), subset=.(!isTip), hjust=-.2)
## if you want to place the value in the branch
ggtree(tree) + geom_text(aes(x=branch, label=label), subset=.(!isTip), vjust=-.5)
rmkepler commented 9 years ago

That does it. Perfect. Thanks for the help.

GuangchuangYu commented 9 years ago

Please remember the syntax will change in future release as I mentioned before. also see https://guangchuangyu.github.io/2015/09/subsetting-data-in-ggtree/.

oscarvargash commented 7 years ago

I am using ggtree and still not able to only show the posterior of nodes with value above 0.8

cptree <- read.beast("cp_mb_beast_format.tre")

p <- ggtree(cptree, layout = "fan", open.angle = 180, branch.length ='none') + geom_tiplab2(size =3, color="black") + geom_point2(aes(size=posterior, subset=posterior>0.8), color="black", show.legend= T) + xlim(NA, 35) print(rotate_tree(p, 90))

This code generates the following error:

Error in scale_apply(layer_data, x_vars, "train", SCALE_X, panel$x_scales) :

RaSieb commented 7 years ago

Hi Oscar I just bumped into something similar with bootstap values from newick format yesterday. The solution is in this post: https://guangchuangyu.github.io/ggtree/faq/#bootstrap-values-from-newick-format
Use aes(label=label, subset = !is.na(as.numeric(label)) & as.numeric(label) > 0.8)) with posterior as label. the imporatant part is the subset=... In the example above:

tree=read.tree(text="((A:2,B:2)95:2,(C:2,D:2)100:2);")
ggtree(tree) + geom_point2(aes(label=label, subset=!is.na(as.numeric(label)) & label >90))

This creates a warning because of trying to convert tip labels by as.numeric, but it works. If tip labels for some reason should be numeric, !isTip & should be added in the subset.

oscarvargash commented 7 years ago

Thank you, that worked!

ferroao commented 7 years ago

It can be done with geom_label also.

library(ape)
library(ggtree)
#make tree
set.seed(2016-12-31)
newrtree<-rtree(12)
//PUT YOUR BOOTSTRAP HERE
a<-1:newrtree$Nnode # just an example
//REMOVE UNWANTED BOOTSTRAPS
a<-unlist(lapply(a, function(x) {if(x<3){x<-NA} else (x)}) ) # example criteria <3
//bootstraps in geom_label
tree2treedata<-treeio::as.treedata(newrtree, a)
tree7<-ggtree(tree2treedata)+ geom_label(aes(label=bootstrap)) + geom_tiplab()
tree7
Abdubidopsis commented 6 years ago

Hi GuangchuangYu and Everybody I have Protein sequences of 17 species and I want to make round tree but I don't understand which format the ggtree accept and which script I have to follow. Please help me out of this trouble.

GuangchuangYu commented 6 years ago

you need to construct a phylogenetic tree based on your sequences using e.g. RAxML, IQ-tree etc.

Than after you have a tree, you can visualize it using ggtree.

hkaspersen commented 4 years ago

Hello! I am having some issues with the methods listed here. I want to visualize specific nodes of interest while excluding the rest. Therefore, I tried the following:

set.seed(500)
tree <- rtree(10)

ggtree(tree, layout = "circular") + 
    geom_text(aes(label = node))

This shows me the node labels: phylo1

Say that I want to annotate the bootstrap labels on node 12, 15, 16, and 18, leaving the rest blank. I tried the following:

nodelist <- c("12", "15", "16", "18")

ggtree(test_tree, layout = "circular") + 
    geom_text2(aes(label=label, subset=label %in% nodelist))

But this does not work, as all nodes are blank. Any ideas on how to do this?

brj1 commented 4 years ago

@hkaspersen I think you want to subset by node, not label. Note that node is numeric, not character.

nodelist <- c(12, 15, 16, 18)

ggtree(tree, layout = "circular") + 
    geom_text2(aes(label=node, subset=node %in% nodelist))

If your tree has node labels use the following instead:

ggtree(tree, layout = "circular") + 
    geom_text2(aes(label=label, subset=node %in% nodelist))