dictyBase / Migration

Entrypoint for dictybase overhaul project
0 stars 0 forks source link

Specs for loading core genomes #5

Open cybersiddhu opened 10 years ago

cybersiddhu commented 10 years ago

Representating strain or subspecies

As suggested in chado documentation append it to the species value. So for canonical dicty entry it becomes Dictyostelium discoideum AX4.

Representing multiple assemblies of the same organism

As discussed in the chado mailing list, there are few options with their ups and downs.

A versioning model will be applied for majority of the sequence features. The idea will be primarilly borrowed from GenBank. Every feature will have an sequence id(internal and akin to GID or PID in GenBank) and stable identifier(accession no in GenBank). The stable identifier always starts with version 1. Any change in feature sequence will create a new feature entry with a new sequence id whereas the stable identifier remain intact and increment its version number(1 becomes 2). In other words, all features with identical stable identifiers will differ in their version and the one with higher version would be the canonical one. The feature history of sequence changes will also be preserved.

cybersiddhu commented 8 years ago

31