YuLab-SMU / treeio

:seedling: Base Classes and Functions for Phylogenetic Tree Input and Output
https://yulab-smu.top/treedata-book/
94 stars 24 forks source link

update read.beast #48

Closed xiangpin closed 3 years ago

xiangpin commented 3 years ago

Description

the problem of nexus format file.

Related Issue

47

Example

this file is different with the beast output file of example.

> library(treeio)
Registered S3 method overwritten by 'treeio':
  method     from
  root.phylo ape
treeio v1.15.4  For help: https://yulab-smu.top/treedata-book/

If you use treeio in published research, please cite:

LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu. treeio: an R package for phylogenetic tree input and output with richly annotated and associated data. Molecular Biology and Evolution 2020, 37(2):599-603. doi: 10.1093/molbev/msz240

> example(read.beast)

rd.bst> file <- system.file("extdata/BEAST", "beast_mcc.tree", package="treeio")

rd.bst> read.beast(file)
'treedata' S4 object that stored information of
        '/mnt/d/UbuntuApps/R/4.0.4/lib/R/library/treeio/extdata/BEAST/beast_mcc.tree'.

...@ phylo:
Phylogenetic tree with 15 tips and 14 internal nodes.

Tip labels:
  A_1995, B_1996, C_1995, D_1987, E_1996, F_1997, ...

Rooted; includes branch lengths.

with the following features available:
        'height',       'height_0.95_HPD',      'height_median',        'height_range', 'length',
        'length_0.95_HPD',      'length_median',        'length_range', 'posterior',    'rate',
        'rate_0.95_HPD',        'rate_median',  'rate_range'.
> tr <- read.beast("./treefile.nexus.txt")
> tr
'treedata' S4 object that stored information of
        './treefile.nexus.txt'.

...@ phylo:
Phylogenetic tree with 936 tips and 197 internal nodes.

Tip labels:
  Pingxiang/JX5/2020//EPI_ISL_421252/_000, Jiangxi/IVDC-JX-002/2020//EPI_ISL_4_001, 20200110302_2020-1-10_Wuhan_L, Wuhan/HB-WH4-197/2020//EPI_ISL_4549_002, YB20200116082_2020-1-16_Wuhan_L, 20200110425_2020-1-10_Wuhan_L, ...
Node labels:
  NODE_0000000, NODE_0000001, NODE_0000005, NODE_0000016, NODE_0000018, NODE_0000029, ...

Rooted; includes branch lengths.

with the following features available:
        'mutations'.
> tr@data
# A tibble: 674 x 2
   mutations  node
   <list>     <chr>
 1 <chr [1]>  938
 2 <chr [2]>  3
 3 <chr [1]>  4
 4 <chr [10]> 5
 5 <chr [1]>  6
 6 <chr [2]>  7
 7 <chr [1]>  939
 8 <chr [2]>  8
 9 <chr [2]>  9
10 <chr [2]>  10
# … with 664 more rows
>
GuangchuangYu commented 3 years ago

I obtained the tree file through IQTree and exported the file in nexus format by figtree

As mentioned by @wook2014 , the file was exported by FigTree. This is weird, as FigTree is expected to export BEAST compatible NEXUS file. Please confirm it.

PS: both software tools are all written by Andrew Rambaut.

xiangpin commented 3 years ago

I have reopened this file with FigTree, it can work, then I reexported it to a nexus file. And It is weird, the statisicial information is after the labels, not after edge length. But the multiple value also does not have {} . eg.t1[&mutation="test1","test2","test3"]:0.008

Then I rexport the example file of FigTree, but if multiple value is numeric type, which does have {}.

test9.nexus.txt

brj1 commented 3 years ago

@xiangpin It looks like the 'multiple' values of the metadata in the file are contained in a single set of quotation marks, not multiple sets as you show. i..e. t1[&mutation="test1,test2,test3"]:0.008 instead of t1[&mutation="test1","test2","test3"]:0.008

xiangpin commented 3 years ago

Yes, you are right. But they are not contained in a {}. So the original read.beast does not parse it well since it extract the annotation according the marks ={,}. In addition, they (t1[&mutation="test1,test2,test3"]:0.008 and t1[&mutation="test1","test2","test3"]:0.008) are both parsed well since the quotations will be removed in the process.