biocommons / biocommons.seqrepo

non-redundant, compressed, journalled, file-based storage for biological sequences
Apache License 2.0
39 stars 35 forks source link

Require Ensembl aliases to be versioned and drop unversioned from database #80

Closed reece closed 4 years ago

reece commented 4 years ago

Unversioned Ensembl aliases are not unique, except when included with the namespace. Having Ensembl versioned namespaces is expensive, apx. 400k aliases for each release. With 20 releases, that's a 20x expansion in Ensembl alias size.

Versions have been available since e83. It's time to drop support for unversioned aliases and, therefore, versioned Ensembl releases. SeqRepo will now use the Ensembl namespace (rather than a versioned Ensembl-## namespace).