Open Boarders opened 4 years ago
I think that we might be doing this already with the old graph representation during the Wagner build command to speed up the process of selecting the minimal edge on which to add the next leaf.
If so, we can look at the old code as a potential starting point.
Parts yes, but this wouldn’t have any pre or postorder passes until the tree was complete. Same overall time complexity—but much lower constant factor. Question would be how Coarse a heuristic it might be.
I think currently it does look like we do at least do a decoration of of each intermediate tree in order to get the edgeSequence (this is looking at iterativeBuild in PCG.Command.Build.Evaluate). Both methods still use a fixed node order to save figuring out the next best node to add. But maybe you had something different in mind here?
This method wouldn't build any intermediate decorated tree, only keep track of various edge distances which are computed in terms of the distances between leaf nodes.
This is probably a good time to review our BUILD
command and refine the types, options, and functionality associated with each.
Ward mentioned that we should add a new build method which uses the distance Wagner. This is described in detail on p. 165 of the Systematics text book. Roughly it works as follows: given an already constructed tree:
If we wish to add vertex
v_k
to this tree then we compute the distance:If
v_i
andv_j
are already internal nodes in the tree then these distances are computed in terms of computations involving only the distances between the leaves in the subtrees to which they are attached. One then adds a new internal vertex and attachesv_k
to the edge with the minimal distance. One nice thing about this method is that we do not need to compute any medians or perform any traversals as it is based entirely on the distance matrix between the leaves. This means we do not need to construct intermediate trees but instead can use a smaller data structure for tracking the distance information.Note: I think it is the case that if the metric is an ultrametric then this is exactly the distance but otherwise is an overestimate because of the triangle inequality. The textbook indicates that this method might cause problems for non-metric distances.
Ward noted that having this kind of distance information might also be useful when performing refinements as this can provide candidate edges along which to perform SPR-type moves.