amnh / PCG

𝙋𝙝𝙮𝙡𝙤𝙜𝙚𝙣𝙚𝙩𝙞𝙘 𝘾𝙤𝙢𝙥𝙤𝙣𝙚𝙣𝙩 𝙂𝙧𝙖𝙥𝙝 ⸺ Haskell program and libraries for general phylogenetic graph search
28 stars 1 forks source link

Add Distance Wagner method #163

Open Boarders opened 4 years ago

Boarders commented 4 years ago

Ward mentioned that we should add a new build method which uses the distance Wagner. This is described in detail on p. 165 of the Systematics text book. Roughly it works as follows: given an already constructed tree:

               [...]            [...]      
                 │      e_ij      │ 
                 v_i  ────────── v_j
                 │                │
               [...]            [...] 

If we wish to add vertex v_k to this tree then we compute the distance:

d(v_k, e_ij) = 1/2 (d(v_k, v_i) + d(v_k,v_j) - d(v_i,v_j)

If v_i and v_j are already internal nodes in the tree then these distances are computed in terms of computations involving only the distances between the leaves in the subtrees to which they are attached. One then adds a new internal vertex and attaches v_k to the edge with the minimal distance. One nice thing about this method is that we do not need to compute any medians or perform any traversals as it is based entirely on the distance matrix between the leaves. This means we do not need to construct intermediate trees but instead can use a smaller data structure for tracking the distance information.

Note: I think it is the case that if the metric is an ultrametric then this is exactly the distance but otherwise is an overestimate because of the triangle inequality. The textbook indicates that this method might cause problems for non-metric distances.

Ward noted that having this kind of distance information might also be useful when performing refinements as this can provide candidate edges along which to perform SPR-type moves.

recursion-ninja commented 4 years ago

I think that we might be doing this already with the old graph representation during the Wagner build command to speed up the process of selecting the minimal edge on which to add the next leaf.

If so, we can look at the old code as a potential starting point.

wardwheeler commented 4 years ago

Parts yes, but this wouldn’t have any pre or postorder passes until the tree was complete. Same overall time complexity—but much lower constant factor. Question would be how Coarse a heuristic it might be.

Boarders commented 4 years ago

I think currently it does look like we do at least do a decoration of of each intermediate tree in order to get the edgeSequence (this is looking at iterativeBuild in PCG.Command.Build.Evaluate). Both methods still use a fixed node order to save figuring out the next best node to add. But maybe you had something different in mind here?

This method wouldn't build any intermediate decorated tree, only keep track of various edge distances which are computed in terms of the distances between leaf nodes.

wardwheeler commented 4 years ago
recursion-ninja commented 4 years ago

This is probably a good time to review our BUILD command and refine the types, options, and functionality associated with each.