Taffy index queries were pretty broken, as @benedictpaten pointed out:
./bin/taffy view -i ./tests/evolverMammals.maf -c > ./tests/evolverMammals.taf.gz
./bin/taffy index -i ./tests/evolverMammals.taf.gz
./bin/taffy view -i ./tests/evolverMammals.taf.gz -r Anc0.Anc0refChr0:400-405
Assertion failed: (strlen(column) == column_length), function get_bases, file taf.c, line 125.
/bin/taffy view -i ./tests/evolverMammals.taf.gz -r Anc0.Anc0refChr0:410-413
Assertion failed: (*row != NULL), function parse_coordinates_and_establish_block, file taf.c, line 68.
Everything worked fine when indexing and querying the .maf, which is probably why it's taken this long to come up.
Anyway, the problem seems to be that the query function works by:
scan to first block
slice first block if necessary so that it starts at given range start
scan forward until end of range (and slice the last block)
The issue that that TAF parsing is dependent on the previous block. So if it sliced before reading the next block, and that slice removes a row (because only gap characters were left after slicing), then the parser aborts when trying to apply the previous coordinates to the next coordinates.
Anyway, the fix is pretty simple: only slice the first block after the next one has been parsed. I added these two problem regions to the tests, and added a test that does a bunch of taf queries and makes sure the output's same as maf.
Taffy index queries were pretty broken, as @benedictpaten pointed out:
Everything worked fine when indexing and querying the
.maf
, which is probably why it's taken this long to come up.Anyway, the problem seems to be that the query function works by:
The issue that that TAF parsing is dependent on the previous block. So if it sliced before reading the next block, and that slice removes a row (because only gap characters were left after slicing), then the parser aborts when trying to apply the previous coordinates to the next coordinates.
Anyway, the fix is pretty simple: only slice the first block after the next one has been parsed. I added these two problem regions to the tests, and added a test that does a bunch of taf queries and makes sure the output's same as maf.