ComparativeGenomicsToolkit / taffy

This is a library C/Python/CLI for working with TAF (.taf,.taf.gz) and MAF (.maf) alignment files
MIT License
24 stars 3 forks source link

Fix bug where tai index query could return block with incorrect column_number #48

Closed glennhickey closed 7 months ago

glennhickey commented 7 months ago

Range queries on the tai index require clipping alignment blocks. There was a bug where the block clipping would adjust the alignment's column_number field by the number of clipped bases, but not accounting for gaps.

This never came up before because the only tool using the index was taffy view, which ignores column_number completely and just prints out the block sequences and coordinates which are correct (and the tests all use taffy view too).

But it leads to a crash when trying to iterate columns with the python API, ex (thanks Konstantinos):

taffy view -i taffy/tests/evolverMammals.maf | taffy norm -o evolverMammals_norm.taf
taffy index -i  evolverMammals_norm.taf

Then run this python script will fail because block.column_number() returns the wrong value

import pathlib
from taffy.lib import AlignmentReader
from taffy.lib import TafIndex

with AlignmentReader('evolverMammals.taf', taf_index='evolverMammals.taf.tai', sequence_name="Anc0.Anc0refChr0", start=502, length=83) as mp:
    for block in mp:
        for j in range(block.column_number()):
            print(block.get_column(j))