ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
523 stars 111 forks source link

Enforce exactly 2 tips per component #1142

Closed glennhickey closed 1 year ago

glennhickey commented 1 year ago

Stub clipping (which is on by default) makes sure there are at most 2 tips per component. But there's nothing stopping from there being fewer than 2 tips.

A case recently came up in a fly graph (see https://github.com/vgteam/vg/issues/4060 and https://github.com/vgteam/vg/issues/4061) where a big tangle at the end of chr4 had an edge coming off the end of the reference path. This one-tipped graph leads to a more complicated snarl decomposition (4 top level chains) which isn't supported by haplotype sampling.

Anyway, this PR activates a new option in vg clip (-S) which makes sure that the first and last steps of reference paths are stubs, which should guarantee each graph component has exactly two ends (which should translate to a single top-level chain and a happy vg haplotypes)