we do not want to assume monophyletic genera, so we can't first infer those, then pick exemplars for a backbone
to reduce the problem space, we tried to make it so that for each genus we would select the two most distal taxa (sequence distance), to increase the odds of having them cross the root of the genus in the total data set
crossing the root is important because then, when grafting, we only have to scale the root of the graft to the same depth as the node connecting the exemplars
and, we want those taxa that have the best data coverage
Therein lies the basic challenge. How to choose?
So the idea here was as follows: