take a look at mRNA "BDOR_000249-RB" ... its exons have a weird, out-of-order stretch that's ~7kbp away from the end of the segment. Downstream, this gets us a SeqLocOrder error from tbl2asn.
Not sure how we managed to handle this before, but since we're now supporting reading a gff in random order, we need to do some kind of sorting before storing the indices for good.
Contradictory standards (or lack of standards) for storing indices is the issue here. GFF seems to follow a convention where negative strand exons look something like
200 250
120 180
50 90
--that is, the lower value always in column 4, but negative-strandedness represented by decreasing values from row to row. But this is not a requirement, and we've got exceptions in our inputs.
TBL looks like this (same indices):
250 200
180 120
90 50
--that is, straight up reversed. So I guess we sort(sorted()) indices when we store them, and reverse(reversed()) them when we write to tbl.
argh.
take a look at mRNA "BDOR_000249-RB" ... its exons have a weird, out-of-order stretch that's ~7kbp away from the end of the segment. Downstream, this gets us a SeqLocOrder error from tbl2asn.
Not sure how we managed to handle this before, but since we're now supporting reading a gff in random order, we need to do some kind of sorting before storing the indices for good.
Contradictory standards (or lack of standards) for storing indices is the issue here. GFF seems to follow a convention where negative strand exons look something like
200 250 120 180 50 90
--that is, the lower value always in column 4, but negative-strandedness represented by decreasing values from row to row. But this is not a requirement, and we've got exceptions in our inputs.
TBL looks like this (same indices):
250 200 180 120 90 50
--that is, straight up reversed. So I guess we sort(sorted()) indices when we store them, and reverse(reversed()) them when we write to tbl.