Closed mihai-sysbio closed 1 year ago
@mihai-sysbio this is great! Do you have a Twitter account that goes with this database? Also are there any other related identifiers in this database for other entity types?
Actually, there's one more identifier I wanted to add:
metatlas.metabolite
Metabolic Atlas Metabolite
MAM01234c
^MAM\d{5}\w$
https://metabolicatlas.org/explore/Human-GEM/gem-browser/metabolite/MAM01234c
There's no Twitter account set up as of yet.
Lovely, I’ll take care of adding the second one to the PR
Lovely, I’ll take care of adding the second one to the PR
Thanks a lot @cthoyt
@mihai-sysbio I noticed that the metabolite vocabulary has a base identifier plus a one letter code for the cellular component in which it lives. Is there a higher-level version of this identifier that doesn't correspond to any component?
Very good question @cthoyt. There is in fact such a version, but at the moment it is not planned to be exposed to the user in the near future. When it will be, could the profile be updated to make that last letter code optional?
In the meantime, Metabolic Atlas has also obtained a profile on Identifiers.org. There, the namespace is joined between metabolites and reactions. While there are some advantages in that approach, I'm wondering how should one go about deciding which setup is preferred - joined namespaces or separate (metatlas.reaction and metatlas.metabolite).
Very good question @cthoyt. There is in fact such a version, but at the moment it is not planned to be exposed to the user in the near future. When it will be, could the profile be updated to make that last letter code optional?
Yes, we can create an additional prefix for this if/when it's exposed. I would highly suggest prioritizing this to enable mapping data from Metabolic Atlas to any other standard source (i.e., it's problematic to have many-to-one mappings)
In the meantime, Metabolic Atlas has also obtained a profile on Identifiers.org. There, the namespace is joined between metabolites and reactions. While there are some advantages in that approach, I'm wondering how should one go about deciding which setup is preferred - joined namespaces or separate (metatlas.reaction and metatlas.metabolite).
This is problematic for a few reasons. Most importantly, it's better to keep different semantic spaces separate. How do we know these are different semantic spaces? They have different identifier patterns and represent different entity types. If we mix them together, it's much harder to de novo annotate data with the type (e.g., you have to do string processing to try and figure out what things are).
I'm pretty sure I checked that https://metabolicatlas.org/identifier/MetabolicAtlas/MAM01234c didn't work properly yesterday, has that been updated?
On Identifiers.org, this is additionally problematic since you can't put more than one example local unique identifier, so it's not shown what a metabolite looks like. Further, the regular expression isn't quite correct since it allows you to construct invalid identifiers like metatlas:MAR01234c
. Last problem with Identifiers.org is they are not often responding to requests to update records (nor are they engaging in discussions to avoid potential problems and make the best new records).
Yes, we can create an additional prefix for this if/when it's exposed. I would highly suggest prioritizing this to enable mapping data from Metabolic Atlas to any other standard source (i.e., it's problematic to have many-to-one mappings)
👍🏻
it's better to keep different semantic spaces separate
For us at Metabolic Atlas (or other resource) it might well be. What users are faced with, on the other hand, is having to know what type an identifier is in order to query it. An experienced user might figure it out, but as a new user, having to know the semantic space a priori doesn't seem like it would be simplifying anything.
They have different identifier patterns and represent different entity types.
That's right. It would have been worse to have different entity type and the same identifier pattern.
If we mix them together, it's much harder to de novo annotate data with the type (e.g., you have to do string processing to try and figure out what things are).
This is a use-case I don't know much about. Would it be too much to ask for a more complete description?
I'm pretty sure I checked that https://metabolicatlas.org/identifier/MetabolicAtlas/MAM01234c didn't work properly yesterday, has that been updated?
The url worked for me then and it still does now. Our production instance hasn't been touched in a while, so I would be surprised if it's not working all of a sudden and randomly. I guess it's working now?
On Identifiers.org, this is additionally problematic since you can't put more than one example local unique identifier, so it's not shown what a metabolite looks like. Further, the regular expression isn't quite correct since it allows you to construct invalid identifiers like
metatlas:MAR01234c
.
I was fully aware of these accurately described trade-offs, and still chose to prioritise what I perceive as increased comfort for the user to only need registry/metatlas:id
rather than registry/metatlas.entityType:id
. Moreover, all identifiers are meant to be unique, so even after separating the namespace we wouldn't be issuing metatlas.metabolite:1
and metatlas.reaction:1
. However, you're hinting above at a use-case where the absence of an entityType
is making things worse, so I would really like to know more about that.
Last problem with Identifiers.org is they are not often responding to requests to update records (nor are they engaging in discussions to avoid potential problems and make the best new records).
What can I say, we will choose to promote the registry that aligns best with our user's needs 😊
Thanks for being patient on the reply - here's what we can do (similar to kegg
- we can mint a top-level metatlas
prefix and then two subspaces that help differentiate them. The Bioregistry will maintain a part of
relationship between these prefixes so users know that the metatlas.metabolite
and metatlas.reaction
are unioned together to create the metatlas
namespace. Further, we can update the regex to be a proper union regex and not have the tradeoff I mentioned.
I totally understand why you would want to have extra typing information in the identifiers (e.g., the MAR or MAM prefix-in-LUID) if you're expecting users to put them into an omni-search box. Since they're already unique it might make sense that they can just be combine into a single semantic space, but this has drawbacks. Consider a data integration scenario when you have a list of prefix/identifier pairs (e.g., entities/nodes in a knowledge graph). It's nice when the prefix tells you the type information since it's not generalizable to apply custom logic to process the strings in for the local unique identifiers (e.g., checking if it starts with MAM or MAR). In the end, you have to reconstruct the same logic you wish the registry had done for you
You can see the new prefixes live at https://bioregistry.io/metatlas!
we can mint a top-level
metatlas
prefix and then two subspaces that help differentiate them we can update the regex to be a proper union regex and not have the tradeoff I mentioned
Brilliant approach 💡 thank you very much for applying this suggestion and deploying it 🚀
Prefix
metatlas.reaction
Name
Metabolic Atlas
Homepage
https://metabolicatlas.org/explore/Human-GEM/gem-browser/reaction/MAR11851
Description
Metabolic Atlas is a resource for exploring metabolism, starting with a set of of community-curated genome-scale metabolic models of human and model organisms, enriched with pathway maps and other tools for easy browsing and analysis.
Example Local Unique Identifier
MAR11851
Regular Expression Pattern for Local Unique Identifier
^MAR\d{5}$
URI Format String
https://metabolicatlas.org/identifier/MetabolicAtlas/$1
Wikidata Property
No response
Contributor Name
Mihail Anton
Contributor GitHub
@mihai-sysbio
Contributor ORCiD
0000-0002-7753-9042
Contact Name
Mihail Anton
Contact ORCiD
0000-0002-7753-9042
Additional Comments
No response