biopython / biopython

Official git repository for Biopython (originally converted from CVS)
http://biopython.org/
Other
4.25k stars 1.73k forks source link

is_terminal bug in newick trees #889

Open twrightsman opened 7 years ago

twrightsman commented 7 years ago

Migrated from https://redmine.open-bio.org/issues/3401

@matklad said:

Consider this weird Newick tree

(((B,C),D))A; Here 'A' is both a root node and a terminal node(since it has only one child: ((B,C),D);). However, is_terminal for 'A' is False:

from Bio import Phylo
import cStringIO

bad_tree = '(((B,C),D))A'

t = Phylo.read(cStringIO.StringIO(bad_tree), 'newick')

for c in t.find_clades(terminal=True):
    print c,

Gives B C D

twrightsman commented 7 years ago

@etal replied:

The is_terminal() method simply checks whether the node in question has any descendents (child nodes). Here, 'A' looks like an internal node (for a rooted tree) because it does have descendents.

>>> print t
Tree(weight=1.0, rooted=False)
    Clade(branch_length=1.0, name='A')
        Clade(branch_length=1.0)
            Clade(branch_length=1.0)
                Clade(branch_length=1.0, name='B')
                Clade(branch_length=1.0, name='C')
            Clade(branch_length=1.0, name='D')

>>> Phylo.draw_ascii(t)
                                                           __________________ B
                                        __________________|
____________________ __________________|                  |__________________ C
                                       |
                                       |__________________ D

Do you consider 'A' to be terminal because the tree could be rerooted at a different internal node to make 'A' external? What rule would you use to determine whether a node is terminal, other than the presence/absence of child nodes?

twrightsman commented 7 years ago

@matklad replied:

I consider a node to be terminal if it has degree of one. So I think that a root can also be terminal vertex if it has exactly one child. It's a slippy terminological issue, but at least wikipedia says that "additionally, the root, if not a leaf itself, is a terminal vertex if it has precisely one child." ( http://en.wikipedia.org/wiki/Tree_(graph_theory) )

PS. Sorry for long answering, I've forgotten completely about the issue.