OBOFoundry / OBOFoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry
http://obofoundry.org
Other
161 stars 201 forks source link

Include assessment of conflicts with Identifiers.org/Bioregistry for new submissions #1519

Open cthoyt opened 3 years ago

cthoyt commented 3 years ago

I've had a nice discussion with @hoganwr here about how the Geographical Entity Ontology has been registered in OBO Foundry (and subsequently included in the Ontology Lookup Service) under the prefix geo.

Unfortunately, this clashes with a well-known bioinformatics resource called the Gene Expression Omnibus, which is represented in Identifiers.org, Prefix Commons, and ultimately the Bioregistry as geo.

As Bill aptly pointed out in the thread, it's not necessarily the prerogative of the OBO Foundry to maintain a globally unique namespace - there aren't any rules to say that geo couldn't be used for the Geographical Entity Ontology and there probably shouldn't be. That being said, during the review process, it wouldn't be difficult for the OBO Foundry review to check that there aren't any conflicts (at least via the Bioregistry, since it's covering a wide variety of registries) and if there are, suggest a change.

@matentzn perhaps adding a field to https://obofoundry.github.io/obo-nor.github.io/dashboard/index.html for "no prefix conflict" would be a practical way to address this?

See also https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1704

nlharris commented 3 years ago

@matentzn should we add a "dashboard" issue label?

matentzn commented 3 years ago

I think this belongs here for now, and needs to be discussed in the next call. I am all for this, if the operations committee is happy to introduce this check I will make a separate ticket in the dashboard repo. @cthoyt afayk are there any conflicts right now?

hoganwr commented 3 years ago

I also wonder whether the Foundry can/should/would provide support to OBO namespace owners to register their OBO ontologies with identifiers.org? Or whether there is even consensus that registering OBO namespaces with identifiers.org is laudable, desirable, etc.?

On Wed, May 26, 2021 at 7:49 AM Nico Matentzoglu @.***> wrote:

I think this belongs here for now, and needs to be discussed in the next call. I am all for this, if the operations committee is happy to introduce this check I will make a separate ticket in the dashboard repo. @cthoyt https://github.com/cthoyt afayk are there any conflicts right now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1519#issuecomment-848703228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJR55X75VMTKLGZE4SPZUTTPTN6HANCNFSM45P3HJIA .

matentzn commented 3 years ago

Whoever takes on the heavy lifting of coordination prefixes (be it biocontext, bioregistries, identifiers.org etc) will have to come up with a convincing governance proposal (two groups want the same prefix), long term deployment strategy (convincing financial plan for deployment and bug fixes) and some kind of minimal guarantees for responsiveness.. I would throw my vote behind any proposal that comes up with all three.. And then we would provide advice on how to deal with registration, or even auto-register the prefix through the API.

mellybelly commented 3 years ago

We did spend quite a bit of time working on governance/process coordination between prefixcommons, identifiers.org and N2T back in the day. I can't say it was 100% successful but it was a start.

I feel strongly that there should be a) more than one resolver, b) that each resolver should agree on what content will be resolved and how, and c) that the prefixes be managed independently of the resolver. OBO should declare/coordinate the prefix registry for things needing to have prefixes managed (though it should not be limited to OBO ontologies, prefix commons was always intended to be broader for example).

@jmcmurry @jkunze (not sure who is in charge of identifiers.org prefixes these days, pls tag them here)

jmcmurry commented 3 years ago

It had been Sarala but neither she nor Nick are at EBI anymore. Perhaps @simonjupp would know.

jkunze commented 3 years ago

Henning (I don't find him on github but will let him know) and I recently talked about reviving prefix policy discussions between identifiers.org and N2T. He's the main contact for this work at EBI right now.

cthoyt commented 3 years ago

CC @hHermjakob

hHermjakob commented 3 years ago

@jkunze @cthoyt Thanks for pointing me to this thread. As hinted at by @jkunze, we have agreed o common prefix curation between identifiers.org and CDL a few years back, a collaboration that hasn't been highly active, but works. One of the rules is to avoid duplicate namespaces, for obvious reasons. The "Geographical Entity Ontology" has now been registered in N2T/identifiers.org as "geogeo", on request of the resource.

mellybelly commented 3 years ago

next steps? should we host a call for all interested parties? Start some further documentation in the meantime? perhaps we can nominate one person from each organization to get together, make a tentative plan/document, and then hold a call?

cmungall commented 3 years ago

I think we need to split part of this discussion into another ticket.

Let's keep this ticket based on the original proposal to assess conflicts as part of the registration process.

I am fully in favor of this. I would hope that we would have unanimous agreement that we would at least recommend that other registries are checked and suggestions are made to avoid conflicts.

I would like to go further, and actively block OBO from minting prefixes that are already in use by biomedical or bioscience-relevant databases. I regard this is plain common sense, but I expect some pushback here, and we would need to detail which registries we would check.

This blends into the other many-pronged discussion about fixing the multitude of problems with ID registries - this is not strictly OBO's problem but we can use our experience to help. Let's start a new ticket for this, and start with requirements from an OBO perspective. I will also start with some other smaller tickets, about how we can improve existing processes for synchronization between OBO and e.g. identifiers.org

nataled commented 3 years ago

This is from our current ID policy page (http://www.obofoundry.org/id-policy) under "Guidelines for selecting an IDSPACE":

"Check identifiers.org Central Registry to make sure the IDSPACE doesn’t conflict with an existing namespace outside of the OBO Library."

cthoyt commented 3 years ago

This is from our current ID policy page (http://www.obofoundry.org/id-policy) under "Guidelines for selecting an IDSPACE":

"Check identifiers.org Central Registry to make sure the IDSPACE doesn’t conflict with an existing namespace outside of the OBO Library."

Since Identifiers.org is relatively incomplete (see comparison at https://bioregistry.io/summary#overlap-between-external-registries), it might make sense to broaden this policy to additional registries that meet some minimum (tbd) requirements for relevance and longevity

nataled commented 3 years ago

@cthoyt cool resource! What should we be looking at to see the incompleteness of identifiers.org?

mellybelly commented 3 years ago

yes please also check N2T and prefix commons, since the three originally agreed to coordinate prefixes (and were intended to be inclusive of all of OBO). I don't know why identifiers.org is the only one mentioned in the OBO guidelines, there are multiple resolvers and as mentioned, each one will be incomplete.

cthoyt commented 3 years ago

@cthoyt cool resource! What should we be looking at to see the incompleteness of identifiers.org?

Sorry about the confusion. It appears the anchors generated by the table of contents aren't "real". Check the section labeled "Overlap between External Registries" and there's a panel that unfolds with all of the pairwise overlap comparisons between the registries covered by the Bioregistry's metaregistry.

nataled commented 3 years ago

@cthoyt yes that's where I was looking. I don't see it mentioned there.

cthoyt commented 3 years ago

No fear, it's an SVG image served from github, so here it is (automatically updated nightly, you can see there's a timestamp on it)

Click here to see the image, it's big! ![comparisons](https://raw.githubusercontent.com/bioregistry/bioregistry/main/docs/img/external_overlap.svg)
nataled commented 3 years ago

@cthoyt right, that's the image I saw. Identifiers.org doesn't appear to be included.

cthoyt commented 3 years ago

Again sorry about the confusion. Since the registry that powers Identifiers.org is called the MIRIAM Registry, it appears that way in the chart (though, it might be the case that that nomenclature has since been retired. I'm not exactly sure so I stuck to what their latest publication called it)

alanruttenberg commented 3 years ago

I'll just point out that the namespaces do not conflict because the domains aren't the same. If we're worried about confusion on CURIEs prefixes should be specified, since CURIEs are a shortcut way of writing a URL. Alan

On Tue, Jun 1, 2021 at 12:26 PM Charles Tapley Hoyt < @.***> wrote:

Again sorry about the confusion. Since the registry that powers Identifiers.org is called the MIRIAM Registry, it appears that way in the chart.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1519#issuecomment-852260904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB3CDVUGCTNU5A2R7D7TRTTQUC3FANCNFSM45P3HJIA .

cmungall commented 2 years ago

so this issue has many interesting discussions but no action.

I think we can distill this ticket into a simple proposal:

(in fact this is slightly redundant since anything in OBO makes it's way into bioregistry but it's good to emphasize uniqueness in OBO)

Can we move forward with this?

nataled commented 2 years ago

Added to the EWG agenda.

cthoyt commented 2 years ago

@cmungall we can do one better, we can add a technical solution that checks it. However, since I got pretty negative feedback on the idea of adding any new code that runs in CI in #1661, I'm not sure if that's an avenue forwards without some soul searching.

matentzn commented 2 years ago

I thinking adding a test for this is a great idea! And to be clear, the issue you reference is not in any way a push against adding new code. It was against adding parallel code, I.e instead of extending https://github.com/OBOFoundry/OBOFoundry.github.io/blob/master/util/validate-metadata.py you suggested adding a different test suite altogether. Also, the pr you are talking about introduces tox which should be done in an independent PR, right?

The tests you added in that PR are totally right and should be in the test suite, i don't think anyone disputes that; same with this here bioregistry lookup.

nataled commented 2 years ago

In addition to changes made to the ID policy document (https://obofoundry.org/id-policy), the EWG has added the following sentence to the principle page itself:

The ontology namespace MUST be unique; that is, it MUST NOT be in current use or have been used in the past.

nlharris commented 2 years ago

So what are the remaining action items here?