This tool is supposed to do a coarse-grained chaining on the minigraph mapping output, with respect to the reference contigs in the graph. But it was somehow written to assume there was only one reference contig in the input, even though it gets run at genome scale in cactus.
Anyway, this is a rewrite to make the code simpler and more general:
mappings are sorted by query position.
target sequences are mapped to reference intervals
runs of query contigs are segmented into blocks that have contiguous (modulo given threshold) target reference intervals
disjoint blocks are greedily selected based on having least aligned bases and dropped
process is repeated until nothing disjoint found
The big question is what threshold to use. 10mb seems to work reasonably well. In general, increasing this trades off recall for precision.
This tool is supposed to do a coarse-grained chaining on the minigraph mapping output, with respect to the reference contigs in the graph. But it was somehow written to assume there was only one reference contig in the input, even though it gets run at genome scale in cactus.
Anyway, this is a rewrite to make the code simpler and more general:
The big question is what threshold to use. 10mb seems to work reasonably well. In general, increasing this trades off recall for precision.