YuLab-SMU / ggtree

:christmas_tree:Visualization and annotation of phylogenetic trees
https://yulab-smu.top/treedata-book/
832 stars 173 forks source link

Cannot read the correct tip.lable & node.label in newick format #193

Closed chxp closed 6 years ago

chxp commented 6 years ago

Issues

Cannot read the correct tip.label & node.label in the newick format file from “The All-Species Living Tree" Project.

The newick file is downloaded from https://www.arb-silva.de/no_cache/download/archive/living_tree/LTP_release_132/

The newick file name is LTPs132_SSU_tree.newick, the file can read correctly in MEGA-X.

Error message

The code I using is:

> a <- read.tree('LTPs132_SSU_tree.newick')
> str(a)
List of 5
 $ edge       : int [1:27804, 1:2] 13904 13905 13906 13907 13908 13909 13910 13911 13912 13913 ...
 $ edge.length: num [1:27804] 0.13811 0.01918 0.01123 0.00584 0.00457 ...
 $ Nnode      : int 13902
 $ node.label : chr [1:13902] "" "Bacteria" "" "" ...
 $ tip.label  : chr [1:13903] "@_1_@" "@_1_@" "@_1_@" "@_1_@" ...
 - attr(*, "class")= chr "phylo"
 - attr(*, "order")= chr "cladewise"
> head(a$tip.label)
[1] "@_1_@" "@_1_@" "@_1_@" "@_1_@" "@_1_@" "@_1_@"
> a$node.label[c(15:17,4460:4463,13888:13894)]
 [1] "@_1_@"         ""              "@_1_@"         "Kiloniellales"
 [5] ""              "@_1_@"         ""              "@_1_@"        
 [9] ""              ""              "Acidilobales"  "Acidilobaceae"
[13] "@_1_@"         ""             

SessionInfo

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS: /usr/lib/libblas.so.3.8.0
LAPACK: /usr/lib/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=zh_CN.UTF-8        LC_COLLATE=zh_CN.UTF-8    
 [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8   
 [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2.2       ggtree_1.12.7        ggplot2_3.0.0       
[4] BiocInstaller_1.30.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18     bindr_0.1.1      magrittr_1.5     tidyselect_0.2.4
 [5] munsell_0.5.0    lattice_0.20-35  colorspace_1.3-2 ape_5.1         
 [9] R6_2.2.2         rlang_0.2.2      plyr_1.8.4       dplyr_0.7.6     
[13] tcltk_3.5.1      tools_3.5.1      parallel_3.5.1   grid_3.5.1      
[17] nlme_3.1-137     gtable_0.2.0     withr_2.1.2      lazyeval_0.2.1  
[21] assertthat_0.2.0 tibble_1.4.2     crayon_1.3.4     treeio_1.4.3    
[25] tidyr_0.8.1      purrr_0.2.5      tidytree_0.1.9   glue_1.3.0      
[29] labeling_0.3     compiler_3.5.1   pillar_1.3.0     rvcheck_0.1.0   
[33] scales_1.0.0     jsonlite_1.5     pkgconfig_2.0.2 
brj1 commented 6 years ago

I was able to reproduce this error and found it can be fixed by opening the newick file in a text editor, deleting the header and saving. i.e. delete the following lines from the newick file:

[All-Species Living Tree. 16S rRNA. June 2018

tree_LTPs132_SSU:
New sequences in this tree were added to the previous
release (LTPs128) using ARB parsimony.

LTP 30% maximum frequency filter was used.

Groups are displayed accordingly to monophyletism and
valid taxonomic affiliations when possible. The
phylogeny of each species is capitulated in the
field "phyl_ltp".

Fields displayed: fullname_ltp, acc, hi_tax_ltp, name
]
chxp commented 6 years ago

Thanks for brj1, your answer can solve my problem.

Maybe read.tree can add the support for the annotation header in newick file.

brj1 commented 6 years ago

The read.tree function comes from the ape package (http://ape-package.ird.fr/). To fix this issue it would have to fixed in that package.