etetoolkit / ete

Python package for building, comparing, annotating, manipulating and visualising trees. It provides a comprehensive API and a collection of command line tools, including utilities to work with the NCBI taxonomy tree.
http://etetoolkit.org
GNU General Public License v3.0
773 stars 216 forks source link

Rerooting branch support value problem #689

Closed metehaansever closed 1 month ago

metehaansever commented 1 year ago

Hello, I am trying to reroot the Newick tree but each format has some problems.

import ete3

t = ete3.Tree("((C,D)1,(A,(B,X)3)2,E)R;",format=1)
left_most = t.search_nodes(name='X')[0]
right_most = t.search_nodes(name='X')[0]
new_root = t.get_common_ancestor(left_most, right_most)
t.set_outgroup(new_root)
new_newick = t.write(format=1,format_root_node=True, outfile="newick_tree.txt")

print (t)

When I try with format=1 and reroot from X, I receive the right tree with misplaced branch support values.

(X:0.5,(B:1,(A:1,((C:1,D:1)1:1,E:1):1)2:1)3:0.5)R:0;

When I try with format=0 and reroot from X, I receive the right tree with the right branch support values but format=0 sets 1 to each branch support values.

(X:0.5,(B:1,(A:1,((C:1,D:1)1:1,E:1)2:1)3:1)1:0.5)1:0;

And I just want to know if am i doing something wrong or if is there any other workaround solution for rerooting with format = 1?

emilhaegglund commented 1 year ago

Hi, I think this is the correct behaviour since with format=1 the values are not interpreted as support values but node names and therefore should be moved on rerooting, and this also explains why it sets all support values to 1. But I still think this is a problem because phylogenies with multiple support values, such as Ultrafast Bootstrap and SH-aLRT from IQ-Tree, can only be read with format=1 and when rerooting them the values are moved. One workaround I have seen is to first loop over the tree and copy the value of the node name attribute to the node support attribute.

for n in t.traverse():
    n.support = float(n.name)

For multiple types of support values, you can split the string of support values and keep one of the values

for n in t.traverse():
    n.support = float(n.name.split("/")[0])

or you can try this quick-fix script that I hope should work.