Closed ArtPoon closed 7 years ago
Revisit this issue in light of recent work on Brad's project and stuff with Mariano
Forgot that I refactored some of Brad's code that addresses this issue. Cherries are rotated to maximize kernel score. This is akin to ladderizing trees. Overall, the kernel score is comparing a set of tree shapes against another set, because some characteristics are ambiguous such as rotations around nodes. Ladderizing the trees deterministically collapses these sets, but some ambiguity remains at cherries. By rotating cherries to maximize the kernel score between two trees, we are resolving this ambiguity on a pair-by-pair basis. Is this still a valid kernel function?
Currently when computing tree kernel at node pair
n1
andn2
, if left and right children of both nodes are tips then these subtrees rooted atn1
andn2
are a match. However, this situation may arise:This should still be counted as a match. Ladderizing the tree does not rotate cherries with respect to tip labels.