SynBioDex-archive / libSBOLcore

The core Java models of SBOL
6 stars 0 forks source link

SequenceAnnotation locations ambiguity #10

Closed mgaldzic closed 13 years ago

mgaldzic commented 13 years ago

d) On the subject of locations, you state that SequenceAnnotation counts from 1 (not 0). You don't state if the feature is inclusive of or exclusive of 'end'. You also don't state if it is always true that start <= end, or if this depends on the strand. The convention taken by the bio* toolkits, GFF, and many other bioinformatics libs and formats is that if there is an explicit strand indicator, start <= end, and the location is inclusive of start and of end. The bio* projects have tended towards factoring DNA locations out into their own data models, so as to be able to support a wide range of locators relevant to biology. GFF and DAS XML format restrict themselves to start/end/strand for locations, so require extra data to piece together e.g. multi-exon structures. The expressivity of locations available through the embl/genbank feature table is much higher, allowing structures involving multiple parts of several source sequences to be assembled, each piece having an independent strand indicator. This may be considered over-kill for the current state-of-the-art synbio applications, but may not be in the future. It boils down to if you wish to be able to represent all of biology through this API or if you are primarily interested in painting coloured rectangles on DNA backbones.

TrevorFSmith commented 13 years ago

This seems like a spec question, not a code question.