YuLab-SMU / treeio

:seedling: Base Classes and Functions for Phylogenetic Tree Input and Output
https://yulab-smu.top/treedata-book/
94 stars 24 forks source link

root function loses tip.label and node.label #120

Open jayoung opened 9 months ago

jayoung commented 9 months ago

hi there,

I am trying to reroot a treedata object, and I find that the tip.label and node.labels are getting lost from the phylogeny (they're getting replaced with c("1","2","3",etc)

The example code below should show you what I mean. I get the same problem if I use the trda treedata object defined in the treedata book

Am I doing something wrong, or is this a bug?

thanks very much!

Janet Young

## load package, get data, 
library(treeio)
data(bird.orders, package="ape")
# add fake node labels
bird.orders$node.label <- paste("node",1:Nnode(bird.orders),sep="")
# convert to treedata
bird.orders_treedata <- as.treedata(bird.orders)

# confirm that there are meaningful tip and node labels
bird.orders_treedata@phylo$tip.label
#  [1] "Struthioniformes" "Tinamiformes"     "Craciformes"      "Galliformes"      "Anseriformes"    
#  [6] "Turniciformes"    "Piciformes"       "Galbuliformes"    "Bucerotiformes"   "Upupiformes"     
# [11] "Trogoniformes"    "Coraciiformes"    "Coliiformes"      "Cuculiformes"     "Psittaciformes"  
# [16] "Apodiformes"      "Trochiliformes"   "Musophagiformes"  "Strigiformes"     "Columbiformes"   
# [21] "Gruiformes"       "Ciconiiformes"    "Passeriformes"   
bird.orders_treedata@phylo$node.label
#  [1] "node1"  "node2"  "node3"  "node4"  "node5"  "node6"  "node7"  "node8"  "node9"  "node10" "node11"
# [12] "node12" "node13" "node14" "node15" "node16" "node17" "node18" "node19" "node20" "node21" "node22"

##### reroot
bird.orders_treedata_rerooted <- root(bird.orders_treedata, "Galliformes")

# we no longer have those meaningful tip and node labels
bird.orders_treedata_rerooted@phylo$tip.label
#  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20"
# [21] "21" "22" "23"

bird.orders_treedata_rerooted@phylo$node.label
#  [1] "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "43"
# [21] "44"

Here's sessionInfo() output:

sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] treeio_1.26.0

loaded via a namespace (and not attached):
 [1] vctrs_0.6.4       nlme_3.1-163      cli_3.6.1         rlang_1.1.2       purrr_1.0.2      
 [6] generics_0.1.3    jsonlite_1.8.7    glue_1.6.2        fansi_1.0.5       grid_4.3.2       
[11] tibble_3.2.1      fastmap_1.1.1     ape_5.7-1         lifecycle_1.0.4   memoise_2.0.1    
[16] compiler_4.3.2    dplyr_1.1.4       fs_1.6.3          Rcpp_1.0.11       pkgconfig_2.0.3  
[21] tidytree_0.4.5    tidyr_1.3.0       lattice_0.21-9    digest_0.6.33     R6_2.5.1         
[26] tidyselect_1.2.0  utf8_1.2.4        pillar_1.9.0      parallel_4.3.2    magrittr_2.0.3   
[31] withr_2.5.2       tools_4.3.2       lazyeval_0.2.2    cachem_1.0.8      yulab.utils_0.1.0
jayoung commented 9 months ago

I also tried an older version treeio_1.20.2 (on a different computer) and this time, the tip.label and node.labels are retained. So something went wrong between versions 1.20.2 and version 1.26.0.

brj1 commented 9 months ago

I can reproduce the error on treeio 1.25.4

jayoung commented 9 months ago

actually, not so sure that treeio_1.20.2 does work as I said on Friday. Now that I'm back on the first computer, and I've installed treeio_1.20.2 here, I still have a problem.

Might be some interaction with one of the other packages I have installed - I'm a little unclear on what's going on under the hood, and which package(s) are most relevant.

I guess my first question is - should I expect root on a treedata object to return updated tree@phylo$tip.label and tree@phylo$node.label (some resorting of the original character vector I would think). Or are these new arbitrary "1", "2" type labels expected?

jayoung commented 9 months ago

I figured out a workaround for this issue in my code: I go back to the tree as a plain phylo object, reroot that, and regenerate the treedata object by adding the metadata.

Interestingly, when I plot the new rerooted trees (using root(phylo)), the node labels are placed differently than they were when I rerooted using old package versions androot(treedata). I think the new version (re-rooting the plain phylo object) is the correct one: the old one seemed to be entirely dropping the node label that was associated with the most basal (and shifting positions of nearby node labels), whereas the new method keeps that node label.

Was there an error before in the 'root' method for treedata objects with node labels? Is that why node.labels are dropped now?

I can't figure out how to make a reproducible example for the possibly wrongly placed node labels now that I've updated my R packages (because root(treedata) doesn't even keep node labels any more). But if this issue gets fixed so that root no longer drops node.labels on treedata objects, I'll test to make sure that it gives the same result as rerooting the plain phylo object.

yiyian-Lee commented 9 months ago

Hello, I can offer more detail, I tried to read the code of root.treedata. I ran codes one by one. I found the changing of tip label occur during this line: phy@phylo <- build_new_tree(tree = re_tree, node2old_new_lab = node2oldnewlab)

if checking the code of treeio:::build_new_tree:

function (tree, node2old_new_lab) 
{
    treeda <- tree %>% as_tibble()
    treeda1 <- treeda %>% dplyr::filter(.data$label %in% node2old_new_lab$new)
    treeda2 <- treeda %>% dplyr::filter(!(.data$label %in% node2old_new_lab$new))
    treeda1$label <- node2old_new_lab[match(treeda1$label, node2old_new_lab$new), 
        "old"] %>% unlist(use.names = FALSE)
    treeda <- rbind(treeda1, treeda2)
    tree <- treeda[order(treeda$node), ] %>% as.phylo()
    return(tree)
}
<bytecode: 0x000001386e341da0>
<environment: namespace:treeio>

I used same data as @jayoung , here is the content of re_tree before input

>re_tree

Phylogenetic tree with 23 tips and 21 internal nodes.

Tip labels:
  t1, t2, t3, t4, t5, t6, ...
Node labels:
  n1, n2, n3, n4, n5, n6, ...

Unrooted; no branch lengths.

but after ran tree <- treeda[order(treeda$node), ] %>% as.phylo()

> tree

Phylogenetic tree with 23 tips and 21 internal nodes.

Tip labels:
  1, 2, 3, 4, 5, 6, ...
Node labels:
  24, 25, 26, 27, 28, 29, ...

Unrooted; no branch lengths.

I don't know whether this is correct class type for as.phylo here, and no more ability to resolve it. thanks for read 👍🏽 edit: I found changing is not happened in line tree <- treeda[order(treeda$node), ] %>% as.phylo()