Closed corburn closed 6 years ago
Hey Jay, I think you are bringing up something that is important here and the exercise is created to address the confusion you have brought up. You are right to criticize that it is inconsistent, but the point is that it is inconsistent by nature. We as humans count by a one base system. So if I ask the normal person on the street what is the sixth character in "AGTCTAG" they would respond, "A". For a computer, it would say whoa what base system are we using here? So the point is to illustrate that a zero base system is different than what we normally think of. That being said there is an error in the description that states the correct values to use when looking for a sequence cut that needs to be remedied.
Would it make sense to use the term 'index' instead of 'position' in the pairwise alignment exercise when referring to location in a sequence?
Although not true of all programming languages, most designed after 1970 use 0-based numbering for arrays and refer to it as an 'index'.
In the few bioinformatics formats I am familiar 'position' can refer to either 0-based or 1-based numbering -- sometimes within the same specification (e.g. samtools).
Perhaps it would make sense if 'index' was reserved for 0-based numbering and 'position' was reserved for 1-based numbering?
IAB Pairwise Alignment Exercise
This is the pairwise alignment exercise I am referencing:
This is the closest to a definition for 'position' as used in IAB, though 'position' is not mentioned.
Example of position inconsistently defined
https://samtools.github.io/hts-specs/SAMv1.pdf
https://samtools.github.io/hts-specs/VCFv4.3.pdf