biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
120 stars 52 forks source link

Register EMAPS, a variant of EMAPA #747

Closed bgyori closed 1 year ago

bgyori commented 1 year ago

We already have an entry for EMAPA: https://bioregistry.io/registry/emapa. EMAPA IDs refer to developmental stage-independent mouse anatomical features, for instance, EMAPA:35178 which resolves to the MGI's Mouse Developmental Anatomy Browser. However, EMAPA has an extension called EMAPS which introduces terms for different developmental stages of this generic EMAPA entry. The IDs are constructed such that they are a concatenation of the EMAPA ID and a Theiler Stage (TS) number. For instance, EMAPA:35178 in TS 23 becomes EMAPS:3517823 which can be resolved at https://www.informatics.jax.org/vocab/gxd/anatomy/EMAPS:3517823.

Worth noting that EMAPS IDs are used in the wild and my motivation for starting the issue is to be able to handle/resolve those IDs.

I started requesting a new prefix at first but then realized that this is a somewhat more nuanced situation that might call for a custom solution. Any thoughts, @cthoyt?

bgyori commented 1 year ago

On a related note, Theiler stages could possibly also be registered, though I'm a bit uncertain about what a resolver should look like. See the list of stages at: https://www.emouseatlas.org/emap/ema/theiler_stages/StageDefinition/stagedefinition.html

cthoyt commented 1 year ago

Though it's not ideal to create identifiers that themselves have meaning, I think this is a good motivation to make both EMAPS and Thieler Stage entries (note, TS can be resolved at https://www.emouseatlas.org/emap/ema/DAOAnatomyJSP/anatomy.html?stage=TS23). Do you know if there's an EMAPS artifact somewhere on the web to point to?

however, you're not going to like the following issue - using TS as a prefix will conflict with the banana registered to CALOHA. Yet another reason NOBODY should be using two-letter prefixes.

cthoyt commented 1 year ago

We might want to think about being able to provide custom prefix remappings on a resource-by-resource basis in PyOBO to help avoid issues like this in the future (though it hasn't happened much so far)

cthoyt commented 1 year ago

Also, it's not obvious if Thieler stage should canonically have the form TSXX, XX (zero padded), or X (I think zero padded is probably the best since that's how it appears in the ontology and some of the links)

cmungall commented 1 year ago

Note that the TS IDs in EMAPA are not prefixed IDs like the rest of the EMAPA IDs. Let's coordinate before introducing too many different ways of referencing mouse stages. We also have https://obofoundry.org/ontology/mmusdv

bgyori commented 1 year ago

I agree, though in this case my goal is just to be able to interpret/resolve EMAPS IDs and Theiler stages that appear in various ontologies "in the wild", I wasn't trying to make a statement about the best representation!

cthoyt commented 1 year ago

This will be fixed by adding EMAPS to Bioregistry in #749 then improving the PyOBO parser in https://github.com/pyobo/pyobo/issues/145