ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
505 stars 112 forks source link

Disable support for multi-way splits #1291

Closed glennhickey closed 7 months ago

glennhickey commented 7 months ago

You used to be able to replace something like ((a,b)(c,d)) with (a,b,c,d) in an input tree. It would lead to more memory usage, but might actually help coverage since two internal ancestors are avoided.

But sometime over the past year, this seems to have changed. I just noticed that doing this caused a dramatic loss of coverage inside the subtree in question when aligning some apes.

So now the best practice seems to be to always binarize the input tree, even if the choice of topology may not necessarily be clear. This PR enforces this. IF you pass in a tree with a node with >2 children, it will fail with an error. This check can be disabled in the configuration XML. Hopefully this will be fixed at some point.