ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
505 stars 111 forks source link

Revert back to mafRowOrderer from taffy sort #1338

Closed glennhickey closed 5 months ago

glennhickey commented 5 months ago

I introduced a really bad MAF bug in this commit: https://github.com/ComparativeGenomicsToolkit/cactus/commit/8157fe75e1206ec6d3a1a0801948873180663b03

While updating to the latest taffy, I swapped in taffy sort for mafRowOrderer. They're basically the same, but not exactly the same in every case. What happened is that in some cases, taffy sort would flip the first row of the MAF to something else (presumably with a lexicographically smaller contig name). This breaks any downstream analysis (bigmaf conversion, taffy index, etc) that wants MAFs sorted by reference genome in first row on the positive strand. When combined with --dupeMode single it's even worse, because the "true" reference line is completely lost.

This bug affects Cactus v2.7.2 and v2.8.0.

Anyway, this PR reverts that commit, going back to mafRowOrderer. Should be pretty easy to add a constraint to taffy sort eventually to add the same functionality.

Resolves #1320