YuLab-SMU / treeio

:seedling: Base Classes and Functions for Phylogenetic Tree Input and Output
https://yulab-smu.top/treedata-book/
94 stars 24 forks source link

support for LSD2 dating file import #111

Open sihellem opened 11 months ago

sihellem commented 11 months ago

Hi,

IQ-TREE added least square dating (LSD2) method to build a time tree (http://www.iqtree.org/doc/Dating). The produced dated tree (.timetree.nex file) looks a lot a BEAST timetree.

Using treeio::read.beast() does not throw any error during the import, but the object is not processed properly, as trying to immediately export the tree results in error. Additionally, most node information (CI_date, CI_height,..) is lost during the initial import.

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)
...
> library(treeio)
treeio v1.15.7  For help: https://yulab-smu.top/treedata-book/

If you use treeio in published research, please cite:

LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution 2020, 37(2):599-603. doi: 10.1093/molbev/msz240
> tree <- read.beast("input.tree")
> write.beast(tree, file="output.tree")
Error in cp(nl) : object 'nl' not found

Could you please add support for LSD2 timetree import?

Bests

brj1 commented 11 months ago

It works as expected for me. If I read in a time tree inferred by IQ-TREE2, I can get the CI_date and CI_height with treeio's read.beast function. Note that CI_date and CI_height don't appear in every node of the tree (usually tips won't have confidence intervals on their date) and the other nodes have two values. These create a column that is a list where each entry of the list is an NA vector or a vector of size 2.

What does your timetree.nex file look like and what is the dput(tree) of your tree after you read it in?

You also might want to try updating R and treeio to the latest version.

sihellem commented 11 months ago

Thanks for the help! Unfortunately, I don't have the opportunity to upgrade R and treeio, as this would entail to upgrade from OS Catalina, and this is not an option for me. The output of dput(tree) is too long to paste here (tree with 2799 terminals) -it actually reaches the printout limit of the console:

dput(tree)
[...cannot read due to printout limit...]
        date4696 = -19.0455, date4697 = 0, date4698 = -22.9515, 
        date4699 = -38.4493, date4700 = -39.9287, date4701 = 0, 
        date4702 = 0, date4703 = -0.615593, date4704 = 0, date4705 = 0, 
[...]
       date5517 = 0, date5518 = -1.73491, date5519 = -135.531
        ), node = c("1", "2", "2800", "3", "2802", "4", "5", 
        "2821", "6", "7", "2820", "8", "9", "2802", "2802", "2802", 
        "10", "11", "2833", "12", "2832", "13", "2802", "14", 
[...]
        "2802", "2802", "2801", "2798", "2799", "2802", "2800"
        )), row.names = c(NA, -5519L), class = c("tbl_df", "tbl", 
    "data.frame")), extraInfo = structure(list(), .Names = character(0), row.names = integer(0), class = c("tbl_df", 
    "tbl", "data.frame")), tip_seq = structure(raw(0), class = "DNAbin"), 
    anc_seq = structure(raw(0), class = "DNAbin"), seq_type = character(0), 
    tipseq_file = character(0), ancseq_file = character(0), info = list())

All seem good though, but when displaying the tree, some old nodes (aged ~120Mya) suddenly display values like -1.2... while other old nodes keep such "correct" values.

However, I noticed that if I first open the lsd-produced tree in FigTree, and resave the tree through: File < Export Trees && tick the options: Include FigTree block, Include Annotations

Then reading the tree with read.beast and export it back with write.beast works without issue, and all node information is kept (and seemingly correct).