fmichonneau / phylobase

An R package that provides a base S4 class for comparative methods, incorporating one or more trees and trait data
17 stars 0 forks source link

readNewick leads to unrooted tree - why? #9

Closed codingbutstillalive closed 5 years ago

codingbutstillalive commented 7 years ago

I am processing a newick string that looks as follows:

((((Golf_V:25,Golf_II:25)Golf:25,(Polo_III:25)Polo:25)Volkswagen:25,((Quadro:25)Audi_A8:25,(Quadro_XY:25)Audi_A7:25)Audi:25)Auto:25,(((ICE_9:25)ICE:25,(RE8:25,RE5:25)RE:25,(IC_99:25)IC:25)DB:25)Bahn:25,(((Line_12:25)Flixbus_Shuttle:25,(Line_3P:25)Flixbus_Premium:25)Flixbus:25)Omnibus:25)Verkehrsmittel;

This corresponds to the following dummy taxonomy of transportation systems (in German: "Verkehrsmittel"):

Golf V, Golf, Volkswagen, Auto, Verkehrsmittel Golf II, Golf, Volkswagen, Auto, Verkehrsmittel Polo III, Polo, Volkswagen, Auto, Verkehrsmittel Quadro, Audi A8, Audi, Auto, Verkehrsmittel Quadro XY, Audi A7, Audi, Auto, Verkehrsmittel ICE 9, ICE, DB, Bahn, Verkehrsmittel RE8, RE, DB, Bahn, Verkehrsmittel RE5, RE, DB, Bahn, Verkehrsmittel IC 99, IC, DB, Bahn, Verkehrsmittel Line 12, Flixbus Shuttle, Flixbus, Omnibus, Verkehrsmittel Line 3P, Flixbus Premium, Flixbus, Omnibus, Verkehrsmittel

As a data.tree structure, the tree looks like this:

                     levelName
1  Verkehrsmittel             
2   ¦--Auto                   
3   ¦   ¦--Volkswagen         
4   ¦   ¦   ¦--Golf           
5   ¦   ¦   ¦   ¦--Golf V     
6   ¦   ¦   ¦   °--Golf II    
7   ¦   ¦   °--Polo           
8   ¦   ¦       °--Polo III   
9   ¦   °--Audi               
10  ¦       ¦--Audi A8        
11  ¦       ¦   °--Quadro     
12  ¦       °--Audi A7        
13  ¦           °--Quadro XY  
14  ¦--Bahn                   
15  ¦   °--DB                 
16  ¦       ¦--ICE            
17  ¦       ¦   °--ICE 9      
18  ¦       ¦--RE             
19  ¦       ¦   ¦--RE8        
20  ¦       ¦   °--RE5        
21  ¦       °--IC             
22  ¦           °--IC 99      
23  °--Omnibus                
24      °--Flixbus            
25          ¦--Flixbus Shuttle
26          ¦   °--Line 12    
27          °--Flixbus Premium
28              °--Line 3P    

My final goal is to process that newick string (stored in a file) like this:

phylo <- readNewick(file="newick.txt", simplify=F, spacesAsUnderscores=F)

MRCA(phylo, c("Golf", "Polo"))

Error: Error in orderIndex(x, order): Tree must be rooted to reorder

So, for some reason, the resulting tree is not rooted. How can I fix this?

fmichonneau commented 7 years ago

Your current tree representation in your newick string is unrooted.

I have rearranged it to root it (arbitrarily)

(((((ICE_9:25.0)ICE:25.0,(RE8:25.0,RE5:25.0)RE:25.0,(IC_99:25.0)IC:25.0)DB:25.0)Bahn:25.0,(((Line_12:25.0)Flixbus_Shuttle:25.0,(Line_3P:25.0)Flixbus_Premium:25.0)Flixbus:25.0)Omnibus:25.0)Verkehrsmittel:13.636364,(((Golf_V:25.0,Golf_II:25.0)Golf:25.0,(Polo_III:25.0)Polo:25.0)Volkswagen:25.0,((Quadro:25.0)Audi_A8:25.0,(Quadro_XY:25.0)Audi_A7:25.0)Audi:25.0)Auto:11.363636);
> tr = readNewick("/tmp/test.tree", simplify=FALSE, spacesAsUnderscores=F)
Warning message:
In checkTree(object) : Tree contains singleton nodes. 
> rootNode(tr)
<NA> 
  12 
> MRCA(tr, c("Golf", "Polo"))
Volkswagen 
        24 
codingbutstillalive commented 7 years ago

Thanks a lot for your help. Could you please formally describe the differences of a rooted and unrooted tree in Newick format? I have difficulties to see the difference from the string representation, unfortunately.

Moverover, I do not understand why the rooted tree (as displayed by data.tree) becomes unrooted when I export it to the Newick format. In fact, the current work flow is as follows: read.csv -> data.table -> data.tree -> ToNewick -> readNewick. Thus, the data.tree displayed here is rooted, but its Newick export is suddenly not. That's odd. But I might be missing something important, as I am no expert for Newick format.

EDIT: I checked now the rules for the Newick format from here: http://evolution.genetics.washington.edu/phylip/newicktree.html

According to this, my tree has a proper formatting and it is rooted. To illustrate this, I write it a little bit differently to highlight its structure:

(
(((Golf_V:25,Golf_II:25)Golf:25,(Polo_III:25)Polo:25)Volkswagen:25,((Quadro:25)Audi_A8:25,(Quadro_XY:25)Audi_A7:25)Audi:25)Auto:25,
(((ICE_9:25)ICE:25,(RE8:25,RE5:25)RE:25,(IC_99:25)IC:25)DB:25)Bahn:25,
(((Line_12:25)Flixbus_Shuttle:25,(Line_3P:25)Flixbus_Premium:25)Flixbus:25)Omnibus:25
)Verkehrsmittel;

So, I really don't see why it is not recognized as a rooted tree!

codingbutstillalive commented 7 years ago

Okay, I now fixed it myself. It seems that one more level was required, i.e. an extra pair of surrounding parantheses plus a branch length, like so:

((
(((Golf_V:20,Golf_II:20)Golf:20,(Polo_III:20)Polo:20)Volkswagen:20,((Quadro:20)Audi_A8:20,(Quadro_XY:20)Audi_A7:20)Audi:20)Auto:20,
(((ICE_9:20)ICE:20,(RE8:20,RE5:20)RE:20,(IC_99:20)IC:20)DB:20)Bahn:20,
(((Line_12:20)Flixbus_Shuttle:20,(Line_3P:20)Flixbus_Premium:20)Flixbus:20)Omnibus:20)
Verkehrsmittel:20);

Okay, at least I solved it.