Closed laserson closed 6 years ago
Heh. You're too speedy @laserson... Reopening to comment. As I'm working through the CIGAR issues, I'm understanding why we went with length
over end
for positional information. I think end
will still work fine, but I think we need to be explicit about whether end
references the alignment or the input. Any indels will make these values differ.
For example, given an alignment like this:
ATGGCCC
ATG--CC
Query end in the alignment is 7, but query end in the input sequence is 5. My inclination is to go with end position in the input, but then we need to make make encoding of indels in the CIGAR mandatory.
If we care, the SAM spec for the CIGAR string indicates one-indexed numbering.
@javh Yes, I agree, I
and D
should be mandatory in the CIGAR.
ah, but now I read that BAM uses zero-indexed numbering...
We should check what the minimal standards group did.
After our poll, we decided to go with Python-style zero-indexed half-open slice notation.
This is currently reflected in the docs, so I will close.
It appears we forgot to make explicit the numbering scheme for coords. IIRC, to minimize ambiguity of annotations, we decided to go with Python-style numbering (which is zero-indexed half-open intervals; or put another way, it's as if the indices are between the letters).