ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
528 stars 111 forks source link

[Experimental] Allow self-alignment in pangenome reference #1461

Closed glennhickey closed 2 months ago

glennhickey commented 3 months ago

minigraph-cactus inherits its large-scale topology from minigraph, with the reference genome kept acyclic and extra copies of things being added as novel insertion sequences. This is great for keeping a simple coordinate system, especially with regards to projecting to the reference. But it can confuse mapping a bit.

This PR adds some prototype options for more aggressively collapsing similar sequence, adding cycles as necessary. They are applied to cactus-pangenome or cactus-graphmap:

If running on cactus-graphmap, then the --collapseRef option must be given to cactus-graphmap-join.

Update: collapsing simplified to a single boolean --collapse flag. reference / non-reference collapsing can still be experimented with, but only via the config XML.