biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
108 stars 48 forks source link

Metabolic Atlas Reaction #495

Closed mihai-sysbio closed 1 year ago

mihai-sysbio commented 1 year ago

Prefix

metatlas.reaction

Name

Metabolic Atlas

Homepage

https://metabolicatlas.org/explore/Human-GEM/gem-browser/reaction/MAR11851

Description

Metabolic Atlas is a resource for exploring metabolism, starting with a set of of community-curated genome-scale metabolic models of human and model organisms, enriched with pathway maps and other tools for easy browsing and analysis.

Example Local Unique Identifier

MAR11851

Regular Expression Pattern for Local Unique Identifier

^MAR\d{5}$

URI Format String

https://metabolicatlas.org/identifier/MetabolicAtlas/$1

Wikidata Property

No response

Contributor Name

Mihail Anton

Contributor GitHub

@mihai-sysbio

Contributor ORCiD

0000-0002-7753-9042

Contact Name

Mihail Anton

Contact ORCiD

0000-0002-7753-9042

Additional Comments

No response

cthoyt commented 1 year ago

@mihai-sysbio this is great! Do you have a Twitter account that goes with this database? Also are there any other related identifiers in this database for other entity types?

mihai-sysbio commented 1 year ago

Actually, there's one more identifier I wanted to add:

metatlas.metabolite
Metabolic Atlas Metabolite
MAM01234c
^MAM\d{5}\w$
https://metabolicatlas.org/explore/Human-GEM/gem-browser/metabolite/MAM01234c

There's no Twitter account set up as of yet.

cthoyt commented 1 year ago

Lovely, I’ll take care of adding the second one to the PR

mihai-sysbio commented 1 year ago

Lovely, I’ll take care of adding the second one to the PR

Thanks a lot @cthoyt

cthoyt commented 1 year ago

@mihai-sysbio I noticed that the metabolite vocabulary has a base identifier plus a one letter code for the cellular component in which it lives. Is there a higher-level version of this identifier that doesn't correspond to any component?

mihai-sysbio commented 1 year ago

Very good question @cthoyt. There is in fact such a version, but at the moment it is not planned to be exposed to the user in the near future. When it will be, could the profile be updated to make that last letter code optional?

mihai-sysbio commented 1 year ago

In the meantime, Metabolic Atlas has also obtained a profile on Identifiers.org. There, the namespace is joined between metabolites and reactions. While there are some advantages in that approach, I'm wondering how should one go about deciding which setup is preferred - joined namespaces or separate (metatlas.reaction and metatlas.metabolite).

cthoyt commented 1 year ago

Very good question @cthoyt. There is in fact such a version, but at the moment it is not planned to be exposed to the user in the near future. When it will be, could the profile be updated to make that last letter code optional?

Yes, we can create an additional prefix for this if/when it's exposed. I would highly suggest prioritizing this to enable mapping data from Metabolic Atlas to any other standard source (i.e., it's problematic to have many-to-one mappings)

In the meantime, Metabolic Atlas has also obtained a profile on Identifiers.org. There, the namespace is joined between metabolites and reactions. While there are some advantages in that approach, I'm wondering how should one go about deciding which setup is preferred - joined namespaces or separate (metatlas.reaction and metatlas.metabolite).

This is problematic for a few reasons. Most importantly, it's better to keep different semantic spaces separate. How do we know these are different semantic spaces? They have different identifier patterns and represent different entity types. If we mix them together, it's much harder to de novo annotate data with the type (e.g., you have to do string processing to try and figure out what things are).

I'm pretty sure I checked that https://metabolicatlas.org/identifier/MetabolicAtlas/MAM01234c didn't work properly yesterday, has that been updated?

On Identifiers.org, this is additionally problematic since you can't put more than one example local unique identifier, so it's not shown what a metabolite looks like. Further, the regular expression isn't quite correct since it allows you to construct invalid identifiers like metatlas:MAR01234c. Last problem with Identifiers.org is they are not often responding to requests to update records (nor are they engaging in discussions to avoid potential problems and make the best new records).

mihai-sysbio commented 1 year ago

Yes, we can create an additional prefix for this if/when it's exposed. I would highly suggest prioritizing this to enable mapping data from Metabolic Atlas to any other standard source (i.e., it's problematic to have many-to-one mappings)

👍🏻

it's better to keep different semantic spaces separate

For us at Metabolic Atlas (or other resource) it might well be. What users are faced with, on the other hand, is having to know what type an identifier is in order to query it. An experienced user might figure it out, but as a new user, having to know the semantic space a priori doesn't seem like it would be simplifying anything.

They have different identifier patterns and represent different entity types.

That's right. It would have been worse to have different entity type and the same identifier pattern.

If we mix them together, it's much harder to de novo annotate data with the type (e.g., you have to do string processing to try and figure out what things are).

This is a use-case I don't know much about. Would it be too much to ask for a more complete description?

I'm pretty sure I checked that https://metabolicatlas.org/identifier/MetabolicAtlas/MAM01234c didn't work properly yesterday, has that been updated?

The url worked for me then and it still does now. Our production instance hasn't been touched in a while, so I would be surprised if it's not working all of a sudden and randomly. I guess it's working now?

On Identifiers.org, this is additionally problematic since you can't put more than one example local unique identifier, so it's not shown what a metabolite looks like. Further, the regular expression isn't quite correct since it allows you to construct invalid identifiers like metatlas:MAR01234c.

I was fully aware of these accurately described trade-offs, and still chose to prioritise what I perceive as increased comfort for the user to only need registry/metatlas:id rather than registry/metatlas.entityType:id. Moreover, all identifiers are meant to be unique, so even after separating the namespace we wouldn't be issuing metatlas.metabolite:1 and metatlas.reaction:1. However, you're hinting above at a use-case where the absence of an entityType is making things worse, so I would really like to know more about that.

Last problem with Identifiers.org is they are not often responding to requests to update records (nor are they engaging in discussions to avoid potential problems and make the best new records).

What can I say, we will choose to promote the registry that aligns best with our user's needs 😊

cthoyt commented 1 year ago

Thanks for being patient on the reply - here's what we can do (similar to kegg - we can mint a top-level metatlas prefix and then two subspaces that help differentiate them. The Bioregistry will maintain a part of relationship between these prefixes so users know that the metatlas.metabolite and metatlas.reaction are unioned together to create the metatlas namespace. Further, we can update the regex to be a proper union regex and not have the tradeoff I mentioned.

I totally understand why you would want to have extra typing information in the identifiers (e.g., the MAR or MAM prefix-in-LUID) if you're expecting users to put them into an omni-search box. Since they're already unique it might make sense that they can just be combine into a single semantic space, but this has drawbacks. Consider a data integration scenario when you have a list of prefix/identifier pairs (e.g., entities/nodes in a knowledge graph). It's nice when the prefix tells you the type information since it's not generalizable to apply custom logic to process the strings in for the local unique identifiers (e.g., checking if it starts with MAM or MAR). In the end, you have to reconstruct the same logic you wish the registry had done for you

cthoyt commented 1 year ago

You can see the new prefixes live at https://bioregistry.io/metatlas!

mihai-sysbio commented 1 year ago

we can mint a top-level metatlas prefix and then two subspaces that help differentiate them we can update the regex to be a proper union regex and not have the tradeoff I mentioned

Brilliant approach 💡 thank you very much for applying this suggestion and deploying it 🚀