YuLab-SMU / treeio

:seedling: Base Classes and Functions for Phylogenetic Tree Input and Output
https://yulab-smu.top/treedata-book/
94 stars 24 forks source link

fix a bug in read.tree() when a special newick file is used #41

Closed gaospecial closed 3 years ago

gaospecial commented 3 years ago

When I use read.tree() to read the a newick file provided by 'The All-Species Living Tree' Project, it fails to get the right tip.label.

file <- "https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_123/LSU_release_02_2017/LTPs123_LSU_tree.newick"
read.tree(file, skip = 11)

This newick is supplied with multiple lines and quote tip label, and I further found these two special conditions lead to the error.

Here is a simple example.

quote_tree <- "(('t2':0.04,
't1':0.34
):0.89,
('t5':0.37,
('t4':0.03,
't3':0.67):
0.9):
0.59)
; "
writeLines(quote_tree, "quote_tree.newick")
read.tree(file = "quote_tree.newick")
## 
## Phylogenetic tree with 5 tips and 4 internal nodes.
## 
## Tip labels:
##   t2, t2, t2, t2, t2
## 
## Rooted; includes branch lengths.

After digestion, I found the error may come from the internal function single_quotes() in ape::read_tree() function. However, read.tree() works fine with one line newick files. Therefore, I just fix this issue directly by collapse the content.

Maybe it is more reasonable to create a PR in ape, but this is a quick fix so far.

GuangchuangYu commented 3 years ago

I think you would better report this to ape since it can't ignore comment in multiple lines.

gaospecial commented 3 years ago

I think you would better report this to ape since it can't ignore comment in multiple lines.

The problem is I couldn't find the ape repo in GitHub.

brj1 commented 3 years ago

Your sample tree and the All-Species Living Tree' both read for me with all tip labels. Maybe try updating your version of ape.

ape doesn't have a GitHub repo. If you have issues with ape you should contact Emmanuel Paradis: Emmanuel.Paradis@ird.fr

gaospecial commented 3 years ago

Your sample tree and the All-Species Living Tree' both read for me with all tip labels. Maybe try updating your version of ape.

ape doesn't have a GitHub repo. If you have issues with ape you should contact Emmanuel Paradis: Emmanuel.Paradis@ird.fr

Thanks for your reply.

I just install the latest version 5.4.1 from CRAN but still found the ape::read.tree() couldn't resolve my quote_tree and the LTPs132_SSU  (16S rRNA) tree correctly.

library(ape)
LSU_file <- "https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_123/LSU_release_02_2017/LTPs123_LSU_tree.newick"
SSU_file <- "https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_132/LTPs132_SSU_tree.newick"
read.tree(LSU_file, skip = 11)
## 
## Phylogenetic tree with 1614 tips and 1613 internal nodes.
## 
## Tip labels:
##   Aeromonas_hydrophila_subsp._anaerogenes__AF508058__Aeromonadaceae, Aeromonas_hydrophila_subsp._hydrophila__CP000462__Aeromonadaceae, Aeromonas_media__AF508059__Aeromonadaceae, Aeromonas_salmonicida_subsp._salmonicida__AY987630__Aeromonadaceae, Aeromonas_salmonicida_subsp._pectinolytica__ARYZ01000149__Aeromonadaceae, Aeromonas_jandaei__AY138850__Aeromonadaceae, ...
## Node labels:
##   , Bacteria, , , , , ...
## 
## Rooted; includes branch lengths.

The result of LSU file is right. (In fact, I provided the wrong url by accident in prior post).

read.tree(SSU_file, skip = 15)
## 
## Phylogenetic tree with 13903 tips and 13902 internal nodes.
## 
## Tip labels:
##   FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, ...
## Node labels:
##   , Bacteria, , , , , ...
## 
## Rooted; includes branch lengths.

However, the tip label is all the same for the SSU file.

My session info is

sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936 
## [2] LC_CTYPE=Chinese (Simplified)_China.936   
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C                              
## [5] LC_TIME=Chinese (Simplified)_China.936    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ape_5.4-1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5      lattice_0.20-41 digest_0.6.27   grid_4.0.3     
##  [5] nlme_3.1-150    magrittr_1.5    evaluate_0.14   rlang_0.4.8    
##  [9] stringi_1.5.3   rmarkdown_2.5   tools_4.0.3     stringr_1.4.0  
## [13] xfun_0.19       yaml_2.2.1      parallel_4.0.3  compiler_4.0.3 
## [17] htmltools_0.5.0 knitr_1.30
gaospecial commented 3 years ago

给 ape 的作者写了两封信都没有回音,看来这个 bug 一时半会儿是补不上了。

emmanuelparadis commented 3 years ago

I had to come back to this issue: the fix in ape 5.5 (version on CRAN) does not work correctly with the small example in example(read.tree). I pushed a fix on GH (version 5.5-1): the examples given by @gaospecial seem to work fine but the spaces are now kept. Note that the quotes are also kept (this can be changed).