Stub clipping (which is on by default) makes sure there are at most 2 tips per component. But there's nothing stopping from there being fewer than 2 tips.
A case recently came up in a fly graph (see https://github.com/vgteam/vg/issues/4060 and https://github.com/vgteam/vg/issues/4061) where a big tangle at the end of chr4 had an edge coming off the end of the reference path. This one-tipped graph leads to a more complicated snarl decomposition (4 top level chains) which isn't supported by haplotype sampling.
Anyway, this PR activates a new option in vg clip (-S) which makes sure that the first and last steps of reference paths are stubs, which should guarantee each graph component has exactly two ends (which should translate to a single top-level chain and a happy vg haplotypes)
Stub clipping (which is on by default) makes sure there are at most 2 tips per component. But there's nothing stopping from there being fewer than 2 tips.
A case recently came up in a fly graph (see https://github.com/vgteam/vg/issues/4060 and https://github.com/vgteam/vg/issues/4061) where a big tangle at the end of
chr4
had an edge coming off the end of the reference path. This one-tipped graph leads to a more complicated snarl decomposition (4 top level chains) which isn't supported by haplotype sampling.Anyway, this PR activates a new option in
vg clip
(-S
) which makes sure that the first and last steps of reference paths are stubs, which should guarantee each graph component has exactly two ends (which should translate to a single top-level chain and a happyvg haplotypes
)