Open bredelings opened 5 years ago
I suspect that implementing a not-quite-as-fast version of the connectivity test would still be a speedup.
This is now being developed on the faster-solver
branch. The plan is to get a minimal working version, and speed up things that seems to be taking the most time. This should take less time than beginning by implementing the full HDT fully-dynamic graph connectivity stuff.
I've done initial coding to use HDT-0. It required
treap-solver
. It is no longer terribly slow, but is still slower than the original as of 11/4/2020.Other approaches to speed up the subtree solver involve trying to avoid redoing the BUILD problem completely for every new branch that we try and test. However, an initial approach using memoization on branch 'use-dict-in-solver' did not appear to produce a speedup either. So this is hard.
(2018) Fast Compatibility Testing for Rooted Phylogenetic Trees
This paper allows compatibility testing directly on a collection of trees without first decomposing them into splits. It should allow performing this test in O(M*log^2(M)) where M=vertices+edges for all the trees T[i] in a "profile". I can't remember how bad our current implementation is (quadratic? cubic?) but this should be much better, especially since it eliminates the bottleneck step, which is to perform tons of set intersections between large sets.
Implementation issues:
This algorithm doesn't deal with incertae sedis taxa. This only affects the taxonomy though.
The current version of the subproblem solver incrementally add splits to a set in
combine( )
. We could instead walk incrementally root-ward from a tree we are testing and remove splits if they are bad. If we think that most splits are good we could (for example) initially try to add a whole tree, and then try to add its subtrees incrementally if that fails, and perhaps avoid doing B tests for a tree with B branches.The algorithm relies on dynamic graph algorithms described in Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity The BOOST graph library only implements stuff under incremental edge insertion and does not allow edge deletion.