etetoolkit / ete

Python package for building, comparing, annotating, manipulating and visualising trees. It provides a comprehensive API and a collection of command line tools, including utilities to work with the NCBI taxonomy tree.
http://etetoolkit.org
GNU General Public License v3.0
768 stars 216 forks source link

fix get_distance() for topology_only=True #742

Open lenacoll opened 5 months ago

lenacoll commented 5 months ago

In tree.get_distance(target, target2, topology_only=True), the number of nodes between target and target2 was not calculated correctly.

In the while loop starting in line 1024, we add +1 for the parent of current This means that when reaching the children of ancestor, we count ancestor twice. The only exception to this is if one of the two nodes, say target is the ancestor -- but in this case we still do count ancestor when going through the while loop for target2, even though ancestor is target and should therefore not be counted.

So in both cases, we need to subtract one from the count of nodes between target and target2. In the previous version this has been done with the if condition in line 1026, which simply skips counting the parent of target, which does not work if target=ancestor.

I therefore deleted this if condition and instead added an if after the while loop in line 1030 to subtract one from the distance computed in the while loop. We need to check that target != target2, as in this case the ancestor is the node that's given, so we would not go into the while loop and therefore don't need to subtract one from the distance computed.

This addresses issue #740.