etetoolkit / ete

Python package for building, comparing, annotating, manipulating and visualising trees. It provides a comprehensive API and a collection of command line tools, including utilities to work with the NCBI taxonomy tree.
http://etetoolkit.org
GNU General Public License v3.0
773 stars 216 forks source link

Add method to generate uniform random bifurcating tree #691

Closed harryrichman closed 8 months ago

harryrichman commented 1 year ago

There are many situations in phylogenetics research where it would be useful to have a quick and easy way to generate a uniformly random bifurcating tree. There does not seem to be a way to do so using currently implemented methods in ete.

To generate a random tree on n leaves, the current documentation suggests running

t = Tree()
t.populate(n)

However, the resulting tree is not uniformly random among all possible rooted, bifurcating trees on n leaves. For example, there are 105 rooted, bifurcating trees on 5 leaves, but the populate.() method only generates 4 of them. (It seems that the documentation for the populate() method is not so clear about explaining what sort of tree is generated, from the user's perspective.)

As another approach, to generate a random tree on n leaves using existing methods, one could construct star_tree as a multifurcating tree with n children and then run

all_trees = star_tree.expand_polytomies()
t = random.choice(all_trees)

However, this approach is not practical for moderately large n because the number of bifurcating trees grows super-exponentially, as warned in the documentation for this method.

jordibc commented 8 months ago

I'm closing this now that we merged https://github.com/etetoolkit/ete/pull/714 and added this functionality to ete4 too with commit https://github.com/etetoolkit/ete/commit/1fff0621ae332d40e01d034c4de69370675e62d8 . Thanks @harryrichman !