biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
119 stars 51 forks source link

Add `preferred_prefix` field to support broader community #166

Closed cthoyt closed 3 years ago

cthoyt commented 3 years ago

While the Bioregistry already supports the usage of different "profiles" depending on if you're in ontology world (e.g., you want to use OBO PURLs), if you're in the systems biology modelling world (e.g., you want to use Identifiers.org IRIs), etc.

However, the OBO Foundry had the good idea to store both a canonical prefix and a stylized prefix (i.e., the preferredPrefix field). It would be nice to add this to the bioregistry as well to allow for writing them in a stylized way for certain downstream uses, especially to capture prefixes like ncbigene which often is written as NCBIGene, but does not itself appear in the OBO Foundry registry.

cmungall commented 3 years ago

this is great - broader than OBO though (NCBIGene isn't in OBO!)

NCBIGene is a bit of an outlier here since there is potential confusion about this prefix as a whole (e.g. NCBI just call it "gene")

Canonical examples: FlyBase, WormBase, ...

is this any different from the preferred casing issue? We need a way to record that the preferred CURIE for SGD is SGD:Snnnn, not sgd:Snnnnn. Seems trivial but required if string manipulation free merges are to be obtained.

cthoyt commented 3 years ago

I think this is probably the same as the preferred casing issue. I would like it to be the case that the bioregistry normalized prefix always must match what happens when you normalize the preferred prefix