airr-community / airr-formats

PLEASE SEE airr-standards FOR FURTHER DEVELOPMENT: https://github.com/airr-community/airr-standards
MIT License
1 stars 2 forks source link

Add explicit docs on numbering scheme #23

Closed laserson closed 6 years ago

laserson commented 7 years ago

It appears we forgot to make explicit the numbering scheme for coords. IIRC, to minimize ambiguity of annotations, we decided to go with Python-style numbering (which is zero-indexed half-open intervals; or put another way, it's as if the indices are between the letters).

javh commented 7 years ago

Heh. You're too speedy @laserson... Reopening to comment. As I'm working through the CIGAR issues, I'm understanding why we went with length over end for positional information. I think end will still work fine, but I think we need to be explicit about whether end references the alignment or the input. Any indels will make these values differ.

For example, given an alignment like this:

ATGGCCC
ATG--CC

Query end in the alignment is 7, but query end in the input sequence is 5. My inclination is to go with end position in the input, but then we need to make make encoding of indels in the CIGAR mandatory.

schristley commented 7 years ago

If we care, the SAM spec for the CIGAR string indicates one-indexed numbering.

schristley commented 7 years ago

@javh Yes, I agree, I and D should be mandatory in the CIGAR.

schristley commented 7 years ago

ah, but now I read that BAM uses zero-indexed numbering...

javh commented 7 years ago

We should check what the minimal standards group did.

laserson commented 6 years ago

After our poll, we decided to go with Python-style zero-indexed half-open slice notation.

laserson commented 6 years ago

This is currently reflected in the docs, so I will close.