taffy norm -d greedily selects paralogous rows to remove in order to tackle block fragmentation. Selected rows are deleted and the links are cut between them and their left and right neighbours. But it looks like I left left_gap_sequence field alone on the right neighbour. This situation (left_gap_sequence but not left link) apparently left following code in a state where it would mis-assign the length field to the merged block.
This can happen any time taffy add-gap-bases and taffy norm -d are used in conjunction.
The patch here is just to remove the gap sequence along with the link to the previous row. Unfortunately, it calls into question the validity of MAFs previously created with cactus-hal2maf --filterGapCausingDupes .
Definitely would be good to have a taffy validate that can get run during cactus-hal2maf to prevent future cases like this.
This is a nasty one that came up in https://github.com/ComparativeGenomicsToolkit/cactus/issues/1201
taffy norm -d
greedily selects paralogous rows to remove in order to tackle block fragmentation. Selected rows are deleted and the links are cut between them and their left and right neighbours. But it looks like I leftleft_gap_sequence
field alone on the right neighbour. This situation (left_gap_sequence but not left link) apparently left following code in a state where it would mis-assign the length field to the merged block.This can happen any time
taffy add-gap-bases
andtaffy norm -d
are used in conjunction.The patch here is just to remove the gap sequence along with the link to the previous row. Unfortunately, it calls into question the validity of MAFs previously created with
cactus-hal2maf --filterGapCausingDupes
.Definitely would be good to have a
taffy validate
that can get run duringcactus-hal2maf
to prevent future cases like this.