Closed sierra-moxon closed 2 years ago
@sierra-moxon can you provide an example URL that shows some information about this gene?
Not yet. Alliance reuses identifiers from other resources to link in their pages, e.g. https://www.alliancegenome.org/gene/HGNC:8616
But, they are in the process of minting new identifiers, and I want to make sure to reserve the identifier prefix so that its available when they need it.
@sierra-moxon AGR is already taken by https://bioregistry.io/registry/agricola. Please choose another prefix.
d'oh! @cthoyt :)
I see in identifiers.org, agricola has a prefix of 'agricola' -- is it possible to change the AGR agricola prefix in bioregstry to "agricola"?
(trying to get all the options here before taking it back to Alliance of Genome Resources to pick a new prefix).
@cthoyt when you say "taken" do you mean that is their primary prefix? Or an alias?
How did these aliases get in to bioregistry? via identifiers.org? Did agricola explicitly request this?
I will use this issue to open a discussion with the agricola folks to see if they would be willing to reliniquish this alternate prefix
however more broadly this is something bioregistry needs to think about - if I am registering a new prefix do I just get to claim as many alternate prefixes as I want? and what if a prefix has been in use by a different community, what is the SOP for resolving this?
aside: it looks like the agricola IDs don't even resolve?
I think the IDs should resolve to URLs like this https://agricola.nal.usda.gov/vwebv/holdingsInfo?bibId=1065631
But I don't see anything on the agricola site that indicates they refer to themselves as AGR!
Also nothing on the googles:
https://www.google.com/search?q=site%3Ausda.gov+agr+agricola
however more broadly this is something bioregistry needs to think about - if I am registering a new prefix do I just get to claim as many alternate prefixes as I want? and what if a prefix has been in use by a different community, what is the SOP for resolving this?
No, nobody gets to claim synonyms when they register prefixes (that would be total nonsense). One of the original goals of the Bioregistry was to provide a comprehensive index of all of the prefixes used throughout OBO Foundry ontologies and other resources consumed by PyOBO. This means I personally curated hundreds of synonyms and lexical variants of prefixes for different resources as I found them used in various resources and mapped them back to an internal standard (in addition to mapping to external registries (MIRIAM, Prefix Commons, etc.) which also had lots of variation).
I didn't keep a full manifest of which resource uses which synonyms, but one of them uses AGR as a synonym for agricola, and that's why it's curated as a synonym.
As it stands, the Bioregistry has zero conflicts between prefixes and synonyms. There is a technical CI test in place to ensure this so it doesn't happen by accident. This is the first request that would create one.
Since Alliance of Genome Resources does not already have their own prefix claimed, it wouldn't be fair for them to be able to say that other people's uses of AGR are invalid (through the scope of the bioregistry), especially so because this is a request to "park" a prefix that does not provide a working endpoint for resolving them.
Here are two options going forwards:
Note - you don't need to email the agricola people, they did not "claim" this prefix as a synonym. Unlike OBO Foundry, the Bioregistry operates without the consent of the resources themselves (though advice is welcome) and is trying to be a practical and useful description of the reality of prefixes and identifiers.
@balhoff did a nice SPARQL query on ubergraph (doesn't have all obo ontologies, but many), and found that CHEBI has a lot of AGR prefixed links.
Stacia and Edith from SGD also noticed that EuropePMC also uses AGR = agricola https://europepmc.org/Help
@sierra-moxon thanks for looking back into that. I've had a really hard time petitioning ChEBI for changes, and it seems even less likely to get EuropePMC to make changes. What are your thoughts? Would you consider a different prefix for alliance? how about alliance.gene
?
NCBI are already using "AllianceGenome:" as a prefix for us (although informally, given that it has not been registered). See for example https://www.ncbi.nlm.nih.gov/gene/176291 ("See related" in Summary box). This is quite long/bulky, but perhaps this doesn't matter. Would "AllianceGenome" be an acceptable alternative to "AGR"?
@khowe this is a bit problematic since Bioregistry requires (for lots of good reasons) only lowercase prefixes, so it would read as alliancegenome
. This can be fixed with a dot delimiter to alliance.genome
. Additionally, this namespace is about genes and not genomes, so it's misleading.
@cthoyt - it's an interesting question about genes vs. genomes. Alliance will have all sorts of pages (allele, genotype, gene, variant, etc.).
But resources can manage this redirection internally without the prefix changing for each new "type" of identifier.
@cthoyt the example in the original ticket gives a gene, but I think we are intending for the prefix to be used for many (if not all) entities resolvable by the Alliance of Genome Resources portal. As @sierra-moxon says, we have lots (and will have lots) of different entity types.
And to be clear for Alliance - do you have a process in place in bioregistry for supporting non-lowercase prefixes (NCBIGene vs. ncibgene) as aliases?
@sierra-moxon yes, there's a preferred_prefix
field for adding casing, but this is purely cosmetic information. It's a little late in the day for me to write a rant about why I casing is bad, so I will save it for later.
If you want a prefix that can resolve lots of entity types then I'd suggest just alliance
. However, I don't suggest doing this since it makes it very hard to reuse content annotated with this kind of identifier. Even worse, I already see that you have entity types inside the identifiers, which is really really problematic as well and shouldn't be there.
I'm not sure I know all the reasons why entity types shouldn't be encoded in identifiers, but I do have experience trying to handle an object that was typed in its identifier as a Gene and then had to become a Pseudogene typed identifier and it was painful.
Further reading in https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2001414 for how to mint good identifiers.
I take the point that "alliance.genome" might be misconstrued to the "genome" sub-domain of all "alliance" identifiers. "alliancegenome" would be a bit better in that respect.
But as to your other point @cthoyt , Bioregisty already has many entries representing resources that have a single prefix for many entity types. e.g. the ZFIN entry (https://bioregistry.io/registry/zfin), which states in the description that it applies to many entity types.
If your point is rather that "alliance" is too general (since there are lots of "alliances"), then a possible alternative (that don't mention genome) is something like "all.gen.res" (and variants)?
From discussions at Alliance this morning, they say AOGR works for them as well (Alliance OF Genome Resources).
Bioregistry inherits a huge amount of baggage from past decisions that we don't have any control over, both on the provider side (e.g., ZFIN) and on the registry building side (e.g., Identifiers.org), so I would be careful to justify doing things one way or another just because someone else did before.
Not sure if anyone has ever done a double dot in a prefix but I'm going to executive veto ever doing something like that... way too complicated. I like aogr
ok, I'll update this issue to request AOGR.
@sierra-moxon it appears that the generation of AGOR identifiers is still in flux, so I think it would be a good time to mention that you should also remove the redundant prefix in the local unique identifier. Further, it's currently the case that the requested example identifier and the regular expression don't match.
From talking with the Alliance, they came to this identifier paradigm as a consensus between 6 large id-minting organizations and want to stick with it at the moment. (I fixed the regex, I believe).
@sierra-moxon can you please provide other examples of other identifiers that do not have gene
inside them? I also don't think that using a \w
in the pattern ^AOGR\w+$
will do justice to potential users - this should be more specific. Same thing about being more specific about the length of this zero-padded number
@cthoyt I was asked by our PI group if A, AL, or ALL would be available as a prefix. I do not see it in the registry, but what to make sure these are not synonyms as AGR was.
@cthoyt Apologies, they asked about AR as well.
@jdepons is this related to the AGOR prefix request or just a general inquiry? The Bioregistry won't accept requests for 1- or 2-letter prefixes and I'd probably suggest not using ALL since it's a word
@cthoyt Yes, this is in regard AOGR. Members of our PI group do not like that prefix so we are discussing other options.
Hi Charlie - after PI discussion, I've updated this request accordingly. :)
AGRKB is the new requested prefix, with a base URL of www.alliancegenome.org/accession/
Thanks for making these updates, I think this prefix is fine. I assume the KB means knowledge base, so we can update the title accordingly. Similarly, the description field should not describe the organization, but the semantic space. What kind of things are in it? Who should use it? Etc. However, if those questions are prominently answered then there’s no issue with also including information about agr itself too.
Looks like alliancegenome.org/accession/100000000000001 gets a 404, too. Can you double check this page is working and also update the uri format string to use the appropriate subdomains and either https or http please? Thanks!
I updated accordingly - note this is still a prefix parker - Alliance does not currently support AGRKB for publically available pages, but has unified on the prefix and curie expansion listed in this ticket.
@sierra-moxon those improvements look great! last question before we finish this is who is the primary contact person? I need their ORCID/email/github handle
One last update above in the about section for your review. Would it make sense to use the "helpdesk" email address for this contact person? (that way, as people migrate, we are not left with a stale contact). The helpdesk is not ever going to go away. help@alliancegenome.org
Most definitely not. This needs to be exactly one main responsible person. Ideally this would be an email address that won't go stale even if they're not responsible anymore, so in case we need to get in touch they can mediate updating the metadata.
@cmungall volunteered his email address cjmungall@lbl.gov
0000-0002-6601-2165
for this.
@sierra-moxon thanks for bearing with me through all of this discussion, I'm quite happy with the result and your prefix is now merged in. It'll appear on the website with the nightly build at the end of the day
Prefix
agrkb
Name
Alliance of Genome Resources Knowledge Base
Homepage
https://www.alliancegenome.org
Description
The Alliance of Genome Resources creates identifiers for several biological entity types including genes, other sequence features, constructs, morpholinos, TALENs, CRISPRs, variants, alleles, genotypes, strains, environments and experiments, phenotype annotations, expression annotations, disease annotations, interactions, and variant annotations.
The Alliance of Genome Resources was founded by the following Model Organism databases and the Gene Ontology Consortium and distributes high-quality, curated knowledge about several model organisms in a single, unified location to support the model organism research communities and for the benefit of human health and medicine:
Contributing Knowledgebases:
Alliance-supported species: Saccharomyces cerevisiae (budding yeast) Caenorhabditis elegans (nematode) Drosophila melanogaster (fruit fly) Danio rerio (zebrafish) Xenopus laevis (African clawed frog) Xenopus tropicalis (Western clawed frog) Mus musculus (mouse) Rattus norvegicus (rat)
Example Local Unique Identifier
100000000000001
Regular Expression Pattern for Local Unique Identifier
^[1-9][0-9]{14}$
Redundant Prefix in Regular Expression Pattern
URI Format String
https://www.alliancegenome.org/accession/100000000000001
Contributor Name
Sierra Moxon
Contributor ORCiD
0000-0002-8719-7760
Additional Comments