:seedling: Base Classes and Functions for Phylogenetic Tree Input and Output
fix a bug in read.tree() when a special newick file is used #41

Closed gaospecial closed 3 years ago

gaospecial commented 3 years ago

When I use read.tree() to read the a newick file provided by 'The All-Species Living Tree' Project, it fails to get the right tip.label.

file <- "https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_123/LSU_release_02_2017/LTPs123_LSU_tree.newick"
read.tree(file, skip = 11)

This newick is supplied with multiple lines and quote tip label, and I further found these two special conditions lead to the error.

Here is a simple example.

quote_tree <- "(('t2':0.04,
; "
writeLines(quote_tree, "quote_tree.newick")
read.tree(file = "quote_tree.newick")
## Phylogenetic tree with 5 tips and 4 internal nodes.
## Tip labels:
##   t2, t2, t2, t2, t2
## Rooted; includes branch lengths.

After digestion, I found the error may come from the internal function single_quotes() in ape::read_tree() function. However, read.tree() works fine with one line newick files. Therefore, I just fix this issue directly by collapse the content.

Maybe it is more reasonable to create a PR in ape, but this is a quick fix so far.

GuangchuangYu commented 3 years ago

I think you would better report this to ape since it can't ignore comment in multiple lines.

gaospecial commented 3 years ago

brj1 commented 3 years ago

Your sample tree and the All-Species Living Tree' both read for me with all tip labels. Maybe try updating your version of ape.

ape doesn't have a GitHub repo. If you have issues with ape you should contact Emmanuel Paradis: Emmanuel.Paradis@ird.fr

gaospecial commented 3 years ago

Thanks for your reply.

I just install the latest version 5.4.1 from CRAN but still found the ape::read.tree() couldn't resolve my quote_tree and the LTPs132_SSU  (16S rRNA) tree correctly.

LSU_file <- "https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_123/LSU_release_02_2017/LTPs123_LSU_tree.newick"
SSU_file <- "https://www.arb-silva.de/fileadmin/silva_databases/living_tree/LTP_release_132/LTPs132_SSU_tree.newick"
read.tree(LSU_file, skip = 11)
## Phylogenetic tree with 1614 tips and 1613 internal nodes.
## Tip labels:
##   Aeromonas_hydrophila_subsp._anaerogenes__AF508058__Aeromonadaceae, Aeromonas_hydrophila_subsp._hydrophila__CP000462__Aeromonadaceae, Aeromonas_media__AF508059__Aeromonadaceae, Aeromonas_salmonicida_subsp._salmonicida__AY987630__Aeromonadaceae, Aeromonas_salmonicida_subsp._pectinolytica__ARYZ01000149__Aeromonadaceae, Aeromonas_jandaei__AY138850__Aeromonadaceae, ...
## Node labels:
##   , Bacteria, , , , , ...
## Rooted; includes branch lengths.

The result of LSU file is right. (In fact, I provided the wrong url by accident in prior post).

read.tree(SSU_file, skip = 15)
## Phylogenetic tree with 13903 tips and 13902 internal nodes.
## Tip labels:
##   FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, FJ611848_Erwiniagerundensis_Erwiniaceae_ErwGerun, ...
## Node labels:
##   , Bacteria, , , , , ...
## Rooted; includes branch lengths.

However, the tip label is all the same for the SSU file.

My session info is

gaospecial commented 3 years ago

给 ape 的作者写了两封信都没有回音,看来这个 bug 一时半会儿是补不上了。

emmanuelparadis commented 3 years ago

I had to come back to this issue: the fix in ape 5.5 (version on CRAN) does not work correctly with the small example in example(read.tree). I pushed a fix on GH (version 5.5-1): the examples given by @gaospecial seem to work fine but the spaces are now kept. Note that the quotes are also kept (this can be changed).