Release COB with COB ids + mappings, rather than with the mapped IDs

OBOFoundry / COB

An experimental ontology containing key terms from Open Biological and Biomedical Ontologies (OBO)

https://obofoundry.github.io/COB

Creative Commons Zero v1.0 Universal

35 stars 8 forks source link

Release COB with COB ids + mappings, rather than with the mapped IDs #244

Open matentzn opened 10 months ago

matentzn commented 10 months ago

Currently, we curate COB using COB ids, and swap out COB ids with exact matches (e.g. COB:cell nucleus with GO:cell nucleus).

This is cumbersome for a variety of reasons:

It is unclear who owns the term. If we change the labels, add synonyms etc, where do we do it? If we import the term into our ontology, do we import from COB, or GO (so anyone using COB will also have to import all terms from the mother ontologies from the COB roots they happen to be using). This has proven quite impractical already, as it results in a lot of boilerplate code in import pipeline like ODK (import this term from COB, but not that, rename this term, but not that one).
COB is suddenly dependent on everything else, while it should have no dependencies and everything being dependent on it. Think a source ontology changing axioms! This will effect the whole OBO verse.

I would like to put forward a motion to publish COB with mappings instead of rewiring the ids.

See some relevant fun side issue here: https://github.com/OBOFoundry/COB/issues/243

wdduncan commented 10 months ago

I would like to put forward a motion to publish COB with mappings instead of rewiring the ids.

I would support this. However, don't the SSOM mappings use owl:equivalentTo? So, the COB term imports all the semantics of the other term. Right?

alanruttenberg commented 10 months ago

I am opposed. The plan all along has been not to change IDs.

matentzn commented 10 months ago

@alanruttenberg

Can you offer your position on who should "own" the IDs? Like, should GO cede their "Molecular Function" term to COB for management and be disallowed from editing it in their own space? Or do yo think it is ok that COB, as an upper ontology, should dependent on everything else. So if a PATO developer wakes up one morning deciding that PATO:quality should be a material entity, having that permeate through COB through all of OBO? (The example is stupid, but you get the point)

alanruttenberg commented 10 months ago

I am not thinking here in terms of ownership, but of correctness and minimal disruption. It's long been a principle, despite occasional pushback, that we don't mint more than one ID for one thing and we don't change IDs unless terms don't make sense or are accidentally duplicated or whatever, and we don't change the meaning of terms. Mappings are outside the semantic web stack we use and I think it's fair to say that many users will not be aware of them, nor know how to apply them in their own settings. Nor should they be obligated to.

As far as ownership goes, I think of our ontology developers as stewards, not owners. If we follow the logic of wanting to "own" all our terms then the logical extension is that all ontologies should have their own IDs and just map pervasively. I can say from experience that this is a losing strategy, and an original impetus for developing the foundry was to avoid that.

If the terms would be better stewarded in their original ontologies then have an automated build that pulls them into COB. If they would be better managed as a set in COB (my gut) then the ontologies that that originally minted them should import COB. I believe that's the goal anyways.

The "stupid" example is indeed stupid. We have a principle that the extension of a class doesn't change. It's bad behavior to do this. If an ontology does that then it's misbehaving and all the mappings in the world won't help with the breakage that ensues.

bpeters42 commented 10 months ago

I wanted to give my $.02 quickly: While I understand and favor Alan's preference to not introduce new identifiers in an ideal world, the reality of OBO development is that we do not have an ideal world. For good and bad reasons, ontology developers do not prioritize updating their ontologies to serve a greater integration goal that COB has. Speaking for as a driving member of OBI which has really tried to be at the forefront of this, we still haven't completed our COB integration (not for lack of wanting, but for lack of developer time to get it done and test out all the consequences which are not trivial).

Nico's suggestion is a 'realism based' approach to ontology development and integration; in this case realism of what can be expected from ontology developers. By making COB the 'owner' of its terms, and allowing to showcase what integration can achieve (and what problems it discovers) without waiting for individual ontology developers to act, I think we will be much faster in showing success, and making people want to adopt.

The fact that all of this is essentially volunteer/side project work needs to be triple stressed (both on the COB side and for ontology projects).

If we were a centrally run company, there would be no question that Alan's approach is right. But we are not.

Bjoern

On Mon, Aug 21, 2023 at 5:01 PM Alan Ruttenberg @.***> wrote:

I am not thinking here in terms of ownership, but of correctness and minimal disruption. It's long been a principle, despite occasional pushback, that we don't mint more than one ID for one thing and we don't change IDs unless terms don't make sense or are accidentally duplicated or whatever, and we don't change the meaning of terms. Mappings are outside the semantic web stack we use and I think it's fair to say that many users will not be aware of them, nor know how to apply them in their own settings. Nor should they be obligated to.

As far as ownership goes, I think of our ontology developers as stewards, not owners. If we follow the logic of wanting to "own" all our terms then the logical extension is that all ontologies should have their own IDs and just map pervasively. I can say from experience that this is a losing strategy, and an original impetus for developing the foundry was to avoid that.

If the terms would be better stewarded in their original ontologies then have an automated build that pulls them into COB. If they would be better managed as a set in COB (my gut) then the ontologies that that originally minted them should import COB. I believe that's the goal anyways.

The "stupid" example is indeed stupid. We have a principle that the extension of a class doesn't change. It's bad behavior to do this. If an ontology does that then it's misbehaving and all the mappings in the world won't help with the breakage that ensues.

— Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/COB/issues/244#issuecomment-1687218209, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2ISIWGRG5BOUI57TLBDXWPZFZANCNFSM6AAAAAA3V35UBI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

alanruttenberg commented 10 months ago

And yet given that people don't update their ontologies regularly, many of them will already be integrated with COB by way of having already imported or MIREOTed the existing terms, and will become less integrated should a new set of ids be put out there.

alanruttenberg commented 10 months ago

Correspondingly, ontologies that are interested in COB will lose absolutely nothing by importing COB if it is using the already-existing terms from older ontologies.

alanruttenberg commented 10 months ago

Curious, I went to look at COB in ontobee. I hadn't realized that that version does use existing pre-existing OBO ids. Looking at assay, which is still OBI_0000070 and eyeballing the list of uses, there are something like 40 ontologies that use that ID. Changing it to some new COB id will require all 40 to update. How is that desirable?

bpeters42 commented 10 months ago

Our approach has been to take an ontology, rip out the terms mapped to COB, replace them with the COB term, using the Sssom mappings. As you point out those are currently pre existing ids. But in some cases, different terms get changed into one (eg 'cell' from GO and CL) . And that is where the problem comes in that Nico described when we e.g. want to change a label or add an Axiom.

On Wed, Aug 23, 2023, 1:41 PM Alan Ruttenberg @.***> wrote:

Curious, I went to look at COB in ontobee. I hadn't realized that that version does use existing pre-existing OBO ids. Looking at assay, which is still OBI_0000070. Eyeballing it, there are something like 40 ontologies that use that ID. Changing it to some new COB id will require all 40 to update. How is that desirable?

— Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/COB/issues/244#issuecomment-1690608826, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IRWOBTDXDHRWLE7EDLXWZTIFANCNFSM6AAAAAA3V35UBI . You are receiving this because you commented.Message ID: @.***>

alanruttenberg commented 10 months ago

If you want to change an axiom, change an axiom. We change axioms when we need to, not to change the meaning of a term, but to clarify it or make it more effective for reasoning. Ditto label. How exactly does having an duplicate ID for a same or almost-the-same-except-for-inconsequential-change make a difference?

Changing the ID does nothing but bring in extra work forever down the line in order to make sure that old IDs are aligned to new IDs. And what if you add an axiom to COB but not to the original, what happens to the SSOM mappings? Remove the mapping? Require everyone be aware of these changes?

In addition, SSOM does nothing for non-ontology mentions such as those in papers, in spreadsheets, in queries, in datasets, or in other existing workflows. For a select few who want to use SSOM fine, but don't impose it's mandatory use for absolutely everybody, in every possible context where an ID might have been mentioned.

Not an a careful search, but worth browsing

https://www.google.com/search?q=obi%3A0000070 https://duckduckgo.com/?q=obI_0000070+sparql&ia=web https://github.com/search?q=+obi_0000070+sparql&type=code https://www.google.com/search?q=obi_0000070

alanruttenberg commented 10 months ago

In the case where there are duplicate IDs (CL,GO:cell), pick one an use it throughout. If 1/2 use one id and 1/2 use other then 50% have to change vs 100% if a completely new ID is minted.

bpeters42 commented 10 months ago

As I said Alan, I agree with you in principle, and all your arguments illustrate well why it would be great if we could just keep one ID per concept. But you are not getting to the problem that we are trying to solve. I think this part of your response gets closest to it:

If you want to change an axiom, change an axiom. We change axioms when we need to, not to change the meaning of a term, but to clarify it or make it more effective for reasoning. Ditto label. How exactly does having an duplicate ID for a same or almost-the-same-except-for-inconsequential-change make a difference

The duplicate terms address the problem of WHO makes a change and WHERE the change is made.

You seem to be suggesting one of two things: 1) the 'home' ontologies of terms like 'cell' will readily and quickly implement any change we ask them to do in their own ontology for the greater good of COB and OBO integration or 2) the 'home' ontologies of terms like 'cell' will give up control of these terms completely and trust that they will be managed in COB.

Based on experience so far, neither of this will happen. So by coining a new ID for COB terms that are typically based on one (or more) OBO terms, but defined and labeled with maximum interoperability in mind, we are able to work faster towards interoperability and figure out what causes problems while not messing with the source ontologies - until they are ready to do so.

I want to stress again that I see and agree with the downsides you are pointing out. But I beg you to focus a bit more on the practical implementation - even if it is ugly and messy. Nico and team are dealing with the realities on a large scale, my work has similar issues on a smaller scale.

Maybe it would be good if you joined one of the operation calls where we are discussing these things?

Bjoern

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

alanruttenberg commented 10 months ago

My views arise out of practical considerations and actual experience having to deal with this sort of thing both in SPARQL and using other technologies. The worse part about it is that changes such as using a different IDs tend to be invisible in already existing projects. One thing that lands up happening is that existing queries return some but not all information. It may be there's a short term gain by going the direction you are going but the long term costs are high. Perhaps they don't seem as present because we're looking at this from the perspective of the developer. I think the consumer has to come first.

I think it is reasonable to have the original authors maintain the terms, either in their own ontology or as part of COB and our principles around responsiveness ought to govern their work on this. Realistically, how many changes are we expecting there to be for the already defined terms. I haven't reviewed them all but aren't they pretty mature?

I'm afraid if we can't even get the source ontologies to contribute/maintain these terms then it's a demonstration of a serious failure of our enterprise. It's frequently the case that it doesn't work to try to fix a social problem with a technical fix. This is one of those cases, IMO. Here I would at least try to get the relevant parties together and see if we can't come to some reasonable agreement on how to maintain the terms.

Perhaps we can talk offline...

alanruttenberg commented 10 months ago

I can join an operations call if you tell me which to join.

matentzn commented 10 months ago

TBH, I find all the arguments here to some degree convincing.

However, let us be clear about what this discussion here means for our progress: If #226 is any indication, the only thing that will happen is that we all lose valuable time in our lives arguing our points, and in the end, the issue will freeze, not be implemented and COB will just go stale.

In my opinion, we need to find a better way to have this discussion: Driving COB forward. All impactful projects we have as OBO Ops Technical Working Group (ROBOT, ODK, OBO Dashboard, RO, PURL system) have one thing in common: they are driven by people that actually do the groundwork - so we have "authority by involvement". 100's of decisions are being made in these projects per year that 60% of the community would not agree with (we are now implementing a technique for using DL Queries WITHIN SPARQL in ROBOT - even Chris Mungall has been crying when he heard this). But - overall the tools are used, and make a difference! If we had these kinds of discussions here every time we wanted to make a feature in ROBOT.. There would not be a ROBOT.

@alanruttenberg are you willing to argue your point AND go fix all the problems your solution entails for us (the people driving to roll out COB)?

Will you go and figure out whether CL or GO should own "cell"? (open the issue, sit them around a table, facilitate a decision)
Explain to "COB class stewards" that they cannot change the extension of a class again because "it is in our principles"? Implementing QC for preventing them from "adding an innocent existential restriction" thinking it "just clarifies the semantics" but will break everyone's ontology builds?
Negotiate label changes if they are necessary (like in the "nucleus" case, where we need a label to be "cell nucleus" because the context the class appears in now, COB, is more general)?
Help build the pipelines that partially import axioms from source ontologies and COB, especially if they use SLME modules?

I am not saying you are wrong with much of what you say, but you are not in the trenches. But being "right" always comes at a cost!

addiehl commented 10 months ago

It was resolved a few years ago that CL owns 'cell' (apart from the CARO class), and GO has accepted that. Let's not relitigate that issue.

matentzn commented 10 months ago

Ok, @addiehl you are right, but the point still stands. Mondo:disease, OGMS:disease, DOID:disease; anatomical entity in 40 different ontologies; various instances of "role" classes. Everything in NCIT, FMA; you can tell dozens of stories why each individual one was a mistake. Many such battles have not been fought yet. The point is now: we need a practical path forward.. using COB ids solves these issues. The "right" way needs a lot more time and resources, which we don't, and wont, have.

alanruttenberg commented 10 months ago

I am willing to help. I need to first review COB and understand where it stands. It might help to have a call with you and Bjoern so you can debrief me on history so I'm aware of all relevant issues and so I don't go into conversations blind. On the specifics you suggest:

I'm willing to facilitate. First step is doing an inventory to see what needs to be resolved. If we can talk first I'd prefer that but if we can't arrange that I can start to review.
I think the first discussion is whether the ontology that authored the term is willing to cede it to COB or wants to manage it itself. If they are willing to cede it to COB management then we're good. If they want to manage the term then have the discussion. I would do that on a term by term basis. It seems fine to have COB built as a combination of native terms and MIREOTed terms.
In the case you mention, you say nucleus -> cell nucleus because it is more general. But if it is truly more general than the existing terms then it has a different extension and is a new term. I have no objection to adding COB terms that don't already exist. If it is strictly a label change with no impact on the extension then it doesn't seem as important to insist on the label change, though a discussion can be had. Worst case add an alternate label, or negotiate about the foundry unique label rather than rdfs:label.
I don't know what SLME is. I am willing to help script ROBOT to do the MIREOTing.

I would like COB to succeed and I think, perhaps naively, that others do too and will do their best to help it be a success.

matentzn commented 10 months ago

I now spend again too many hours writing and rewriting a response. I added my original response below, but the matter of fact is that we first need to agree on the following:

What is COB?

These are the assumptions I work under, not the assumptions of any other COB developer!

Biology and Biological Ontology is messy and ambiguous
We need a simple biological upper layer with simple classes like gene, disease, anatomical entity and assay to align our ontologies against (we won't agree if a disease is a specifically dependent continuant, a disposition or a material entity, or a process, or at least not right now)
Many of these classes are not universally agreed on as some BFO upper-level category (some will, of course), and are in that sense "ambiguous" or "uncommitted". COB must be resilient to ambiguity, imprecise definitions etc.
We need to roll out COB yesterday, not tomorrow.

If we can't even agree with these 4 premises, we should not try to solve the following.

OLD, now deprecated message I wrote

I am a bit doubtful but I applaud your enthusiasm to help out. Even if you were to do all the work required here, I am still doubtful this is the right path. It will take so long to sort everything out, that COB is dead before it started (remember #226? Now multiply by 100).

As I think your path will take ages to sort out, I would suggest we do it like this:

First use COB ids and
In parallel, you can initiate a process of replacing COB ids with equivalents in cases where an ontology decides to cede their classes to COB (that way I can start working with COB right now and don't have to wait)
Drop all cases where there is no hope of ceding, e.g. GO, from the mappings.

I will give you two problems to chew on for starters.

Extracting modules.
1. The way we extract modules (I am surprised you of all people suggest MIREOT over SLME!) in more than 100 ontologies now is by:
  1. collecting a seed for the whole ontology (the terms to import)
  2. defining a set of ontologies to import from
  3. extracting the terms in the seed from all imports selected (e.g. not a "seed by ontology"-type scenario, if we have OBI:assay in the seed, it will be extracted from all ontologies which is usually no problem, because they are base files).
2. What would be the best way to deal with the situation that you have an OBI and a COB import?
  1. Add an extra preprocessing step that removes OBI:assay from OBI prior to extraction?
  2. This whole thing messes also with our definition of "base" file which is the culmination of a 3-year discussion with some colleagues (not the community, granted) to define the part of the ontology that is truly "internal" (something you would, if I know you well enough, completely disagree with). Will these COB terms be part of the base or not?
Here are the current COB to external mappings: https://github.com/OBOFoundry/COB/blob/master/src/ontology/components/cob-to-external.tsv There are quite a few with more than one equivalent class (e.g. "Protein" in Chebi and PRO). Maybe it would be good to (1) review the equivalences, and (2) start determining who in the case of conflicts should take precedence. The rest of the equivalents need to be convinced to cede control to COB.

Some disconnected thoughts:

Personally, I think we should at least consider minting COB ids in the case that the ontology owner do not want to cede their terms to COB.. But even in this case, I am not convinced by all this. You are too optimistic IMO.
GO has been trying to rename "molecular_function" to "molecular function" for 5 years. Getting them to give us authority over that class is impossible.
If an upper ontology depends on other ontologies it is too exposed to be successful. (No way BFO would have been successful if you had let me manage the BFO:specifically dependent continuant class :-) ). You are also exaggerating the query aspect. The same queries will still work exactly as before - they will just return more results if you change the query terms to COB terms after they are aligned with the ontology ones (if we use rdfs:subClassOf for the alignment, SPARQL results won't even change).

alanruttenberg commented 10 months ago

I can sympathize with the effort it takes to craft these responses. They take me a while compose too.

I can comment on some of it now, but not all. The original post is very useful and actionable. The cob-to-external.tsv list looks like a good place to start, as is the version of COB that is in Ontobee where obviously some choices were made.

MIREOT: I use that term in a generic sense. There are a number of ways to implement it. I know what SLME is - I hadn't known the acronym, or that it was being used in production. Back when I was actively doing OBO technical work and implementing MIREOT for the first time, I had hoped we would get there. I'm glad to hear it is being used. See e.g. my suggestion on how to leverage it in Protege. You should understand that for my day-to-day work I generally use my own bespoke tools, which can do most if not all of what ROBOT can. I use SLME regularly, not via robot but via my LSW2 tools many of which pre-exist ROBOT.

Queries: If you have have OBI:assay in a query and one now adds COB:assay as a different term with some data sets using the COB term, then the old queries will return fewer than the correct number of results when run on combined data sets. I presume you aren't suggesting that the COB:assay would be subclass of OBI:assay, in which case the queries would indeed continue to work. In work I did pre-ontology it wasn't uncommon to have even single databases change over the years in a less-than-controlled way, with the result being that to get all the answers to some query you had to reverse engineer what all the possible ways of representing what you were looking for and writing long disjunctive (SQL UNION) queries. That is not a happy state of affairs and the experience is why I push so hard to minimize what I believe to be unnecessary changes.

On the "what is COB?" notes, given that you acknowledge these are not necessarily the views of all COB developers, I don't think it's necessary for us to completely agree in order to get work done. I'll comment not for the purpose of arguing but to help you have a model of where I'm coming from.

Yes, it is messy. But that doesn't mean all of it is messy or that we should accept any and all messiness. Where it's essentially messy we have to do something messy. Where it's not we shouldn't make things worse.

2,3 I'm not sure there's anything "simple". Given that it's a messy field, I suspect "simple" is probably an illusion. But yes, I see that there may be cases where we can't agree on upper level term and we'll have to do something to make progress. Whether doing so yields resilience we will see.

To better understand if it would we would have to extrapolate to the kinds of changes imagined, to the type of things people try to do with it, and then assess the extent to which those changes do or don't effect those things. Perhaps some of that has been done, but I haven't been party to such discussions. You and I have had an exchange at some point about being "uncommitted" and we don't completely see eye-to-eye on that. I would aim to be as committed as is practical which I think is more than "uncommitted". In the case of disease, it's clear there isn't consensus. OTOH there are people who do use a more precise definition and whatever we do shouldn't make their efforts to do so moot. I'll have a better idea of the extent of the problems once I've done some work.

Whether we need COB yesterday or not, yesterday has passed. I'll try to make progress at a reasonable pace given the constraints on my time. I hear that you don't want this to drag on for a very long time.

matentzn commented 10 months ago

So how do you suggest that we proceed? And on what timeline? I was trying to plan the next step in the evolution of OBO which we call GUOBO (Grand Unified OBO Ontology), which is basically an attempt to create a single coherent, all-encompassing ontology where all the individual OBO ontologies that pass a certain quality threshold are "components". One of the key principles of GUOBO is that all participating ontologies are fully aligned with COB (more strictly, there exists no class in the namespace of the ontology that does not fall under one of the classes in COB). This means we are starting to roll out COB in a few ontologies now to get this process started.

We cant start this process unless we have some clarity on how the axiomatisation and annotations of the COB classes are extracted from the set of ontology dependencies..

wdduncan commented 10 months ago

FWIW, I don't think there will ever be a "Grand Unified OBO Ontology" (GUOBO) using the mapped (i.e., non-COB) IRIs. There is just too much disagreement (as already discussed). COB needs to be its own thing, with its own semantics.

There already exists cob-native.owl, which uses COB iris. I personally prefer that this (i.e., cob-native) file be the primary ontology (i.e., cob.owl), and that a separate ontology consisting of the mapped iris (e.g., cob-mappings.owl) was instead created. Or, perhaps, the creation of a grand import ontology should a separate project all on its own.

Mappings between the COB iris and external iris already exist. However, the cob-to-external.owl file only confuses things b/c the owl:equivalentTo mappings entail that COB classes have the same semantics as the external classes, thus fueling the ongoing debates. If a mapping resource is needed, SKOS mappings would be better.

alanruttenberg commented 10 months ago

@matentzn I will try to review the mappings TSV this week and so some triage. As for the larger questions of strategy it's a bit much for me to think through given my day job and lack of background on COB trajectory in general. I think it would be helpful to talk if you have time so we can go back and forth and I can get a better picture of the bigger process.

wdduncan commented 8 months ago

I think I've changed my original position on this ... shocking eh?

After starting a few new ontologies using the COB ids, I found it aggravating to have to "rewire" the subclasses to point to the COB parents. It is easier (at least for me) to use the IDs from the original ontologies.

That being said, we need to decide on what COB is meant to do within the OBOF. My original hope was that COB would have a separate semantics (hence, requiring new IDs) that would harmonize/resolve some of the never ending debates. A kind of upper-level bridge between the strict and non-strict BFO communities. I have pretty much given hope of this ever happening (sorry).

My thoughts at the moment are that COB should serve as kind of junction box that provides a centralized resource to facilitate OBOF imports. Doing this, however, would require compromise. The formal semantics imposed by the source ontologies would need to be weakened. The BFO folks (I suppose that I partially belong in that crowd) will have to relax what I call "BFO purity tests", and live with high level classes like "disease or disorder". Otherwise, I think we will just continue on debating issues ad nauseam.

sebastianduesing commented 7 months ago

There are a lot of good points on all sides here. (For those here who don’t yet know me, I’m coming to COB from the OBI dev team, and I intend to help out with COB development tasks as I’ve done with OBI.) At risk of stating the obvious, part of the conundrum is that each of these courses of action addresses some of COB's basic desiderata about this issue, including:

Minimizing total work. Everyone involved has limited time to work on this. Therefore, the solution that minimizes the total amount of work that we need to do is optimal.
Minimizing up-front work. We want to roll out COB ASAP. Therefore, the solution that minimizes the amount of work that we need to do up front is optimal.
Minimizing barriers to adoption. We cannot depend on other ontologies to be responsive to the structural consequences of decisions we make. Therefore, the solution that minimizes the amount of work other ontologies will need to do to adopt COB is optimal.
Maximizing control. For each term in COB, the less control we have over its development, the less we can be certain that it will continue to fulfill our needs for it. Therefore, the solution that maximizes the amount of control we have over the totality of COB terms is optimal.
Minimizing vulnerability. Terms imported from other ontologies are vulnerabilities for COB because they subject COB to the risk that decisions made by other ontologies will break things for COB. Therefore, the solution that minimizes COB’s number of outside-ontology dependencies is optimal.

Yet solutions that are optimal by some of these principles are inherently unoptimal by others: an ontology over which we have full control is an ontology for which we are doing all the work that has to be done. Furthermore, there are certain unideal facts that further constrain the solutions available to us:

COB is late to the upper-level ontology party. Most of the terms that are within COB’s scope already exist and are spread across a large number of ontologies.
We cannot rely upon other ontologies to develop in ways that are constructive for COB and that align with our visions for COB terms. Whether due to unresponsiveness, susceptibility to error, or misaligned development goals, some amount of changes made to terms in COB’s scope by other ontologies will be both harmful for COB and essentially permanent.

My goal in summarizing these threads of the conversation is to make it easier to analyze which possible solutions both align with COB’s desiderata and can coexist with the unideal facts. Notably, the two options discussed here, minting COB IRIs for core COB terms or building COB primarily out of terms imported from other ontologies, are not mutually exclusive. Here is one proposition for how to move forward on this issue that, I believe, would balance our desiderata and our constraints: We keep non-COB terms in COB as a blanket policy, but we grant COB authority to mint IRIs for any terms for which either (1) no term with an equivalent extension currently exists (an obvious and uncontroversial right), (2) several terms with conflicting definitions and extensions are already in widespread use, necessitating COB as a consensus-driver (e.g., disease/disorder would likely be a good candidate), or (3) a previously-acceptable term imported in COB becomes logically or semantically unacceptable, and efforts to fix it in cooperation with its host ontology are either impossible or unsuccessful.

This approach would have several notable benefits. As @matentzn said, it’s unlikely that many ontologies (e.g. GO) would be interested in ceding major terms to COB. By using their terms as much as possible, COB fosters goodwill, likely increasing the odds that other ontologies will want to cooperate/align with COB. Preserving existing IRIs also requires significantly less work up front and overall than replacing them, so we can roll out COB much faster. I think @wdduncan’s experience here is informative about the value of keeping existing IRIs where possible. We also avoid wasted time and effort involved in replacing stable and workable terms that have non-COB-IRIs (a lot of OBI terms, for instance, but also certain terms from other ontologies), and it requires less work from ontologies that use current non-COB terms to align with COB. But by granting COB the right to liberally replace non-COB-IRI-terms with COB-IRI-terms whenever we judge it necessary, we shield ourselves from much of the vulnerability that comes from a policy of blanket acceptance of external ownership of the terms in COB’s scope. Ultimately: the only work required of us is the strictly necessary work; we lean into interoperability and ease of use by defaulting to using what is already there; and we treat inclusion of external ontology terms in COB as a privilege that we can revoke when it no longer serves us.

Some notable downsides of this approach are that it requires us to be vigilant about changes made by other ontologies to terms in COB, and it requires us to be responsive when undesirable changes occur. Both of these things are eminently possible—arguably all ontologies that import any external terms face these issues—but I would argue that solutions that place the onus on us to be responsive are preferable to those that require other ontologies to be responsive, as the former is within our control.

The path I outlined is just one of many viable options. I'm proposing it now mostly to suggest an evaluative framework for choosing COB's path forward and then to ask: which solution(s) is/are best at satisfying COB’s desiderata and coexisting with the unideal facts?

matentzn commented 7 months ago

@sebastianduesing First of all, your comment is a rare example of clarity with next to no "opinion baggage" and very actionable analysis of the current situation. It can't have been easy to write this, so I thank for you this!

TBH, I am very happy to proceed on a solution that follows your way of thinking. Of course, we need some simple "conditions" for making a "minting" decision - disease may be a bit less controversial (basically there is no other way unless we want major groups to boykott cob, but still there will be resistance), but I am sure we can develop conditions that allow this even for more controversial cases.

However, your argument does not address key technical problems. Which ontology will we import "GO:molecular function" from? GO or COB? Who has the last word in case of a clash (this is not the sociotechnical question to which you describe a good path forward, just the technical)? If we import from both, we greatly increase the "attack surface" for conflicts (imagine a change in an upper level alignment). Given what you say we basically need a guarantee that that certain classes "are not editing without announcement on the COB tracker". I think this is doable - so we can go to GO and say: can you agree that changes to MF CC and BP must be announced on the COB tracker prior to implementation?

We basically assume that these classes, once created, don't change. I am fine with that. We could try and move this issue forward based on this assumption.

sebastianduesing commented 7 months ago

@matentzn Thank you for your response! I’m glad to hear you would be happy to proceed on a solution like that; resolving this issue would be a big step forward for COB, so I’m happy to support progress of any sort here.

I completely agree that there are some important technical questions that need to be answered before a strategy like the one I proposed could be implemented. You raised a really good question about which ontologies should be the source of imported terms. The answer to that may be at least partially dependent on what we see as COB’s role with respect to the terms in its domain. If we want COB to be a one-stop shop for mid-high-level biomedical terms, importing from COB is probably the better choice. If we want COB to be something like a source of truth—by which I mean that COB serves to organize/interconnect reliable and trustworthy mid-high-level terms within its domain, pointing users to useful terms and showing them how to use them in relation to other terms to which they may not be connected in their native ontology—it would probably be preferable to import from the original source ontology. I personally lean towards the second option, importing from the original source, but I think there are good cases to make for either strategy, and I agree that importing from both is basically untenable. I recognize there are significant technical dimensions to this problem beyond with the ideological question I just mentioned. I’m not confident that I know exactly how the technical consequences of each path would shake out, though I’ll look into it and try to get a better handle on that.

I like your proposal that, as a condition for inclusion in COB, we require GO/similar owners of high-level COB terms to announce changes to those terms on the COB tracker prior to implementation. Firstly, I think that’s a reasonable compromise to make with those other ontologies—we use their terms, essentially routing traffic to their ontologies, rather than replacing them with our own terms, and in return they notify us of relevant changes to those terms—but secondly I also suspect that those critical terms will not be changed very often because they’re also critical to the ontologies that host them. I searched through GO’s GitHub and found no substantive changes to “GO:molecular function” over their ~10 year GitHub history (at least as far as I could find, searching both the label and the IRI/PURL in open/closed issues/PRs). And as you said earlier in this thread, they haven’t renamed “molecular_function” to “molecular function” in five years—I suspect they’re not eager to touch its axioms either. I would guess that this applies to most high-level COB terms. That being said, if we want a bit of a confidence check to make sure that other important COB terms have a similarly stagnant edit history in their source ontologies, I can come back with that info.

For these high-level terms, keeping them the way they are so nothing breaks is a high-stakes issue for us, but it’s also a high-stakes issue for their source ontology. That’s not to say groups never make changes that break their own ontologies, but I think we can reasonably anticipate that any changes to these high-level terms will be rare and conservative, and as such, I suspect it wouldn’t be a significant burden on the source ontologies to post on COB’s issue tracker whenever changes get made. We might suggest adding an editor’s note to those terms as a reminder that changes should be reported to COB. I’ll also volunteer to be the one to reach out to other ontologies about this, if we commit to this path—assuming no one else wants that job.

All that to say, I think the assumption that these classes generally won’t change is a reasonable one.

Returning to your comment about conditions for making the decision to mint a new IRI, I definitely think we can find a way to allow for even the more controversial cases. I think it may be helpful to pull together a short list of cases that are controversial/complicated for which we could ideally justify minting a new IRI, so we could try to construct a set of rules based on those test cases. If you have any cases that come to mind (or if anyone else looking at this issue has any), that would be a great start. I’ll also see if I can find any that stand out as interesting test cases.

Thanks again for the time you’ve spent on this discussion—I’m hoping this will result in some significant steps forward for COB. I’m happy to help out in any way I can along the way!

sebastianduesing commented 6 months ago

EDIT: My method of searching for edit history did not catch everything I thought it was catching. This information is incorrect.

ORIGINAL: I went through the GitHub histories of several other relatively important/high-level imported COB terms, and I'm feeling more confident in my suspicion that those high-level terms don't actually get edited by their host ontologies much (if at all). Here's what I've found so far:

GO:'biological process' has no significant history of edits I could find in GO's ~10 year GitHub history.
CL:'cell' has had an annotation property removed and mappings updated but no logic/definition changes that I could find in CL's ~8 year GH history.
IAO:'information' has no significant history of edits I could find in IAO's ~8 year GH history.

I'm happy to check on any other terms of interest, but based on this, I think it's reasonable to assume that high-level terms like these are likely to remain stable in their host ontologies.

addiehl commented 6 months ago

The definition of GO biological process was in fact changed earlier this year, as as result of a discussion here as implemented by this PR on February 27, 2023.

addiehl commented 6 months ago

Similarly, GO molecular function and GO cellular component also had their definitions adjusted in the past year.

addiehl commented 6 months ago

If you look at QuickGO for BP, MF, and CC, you can see in the term history that the definitions of all these terms have changed twice in the last 8 years.

sebastianduesing commented 6 months ago

Thanks for pointing that out. I'd been searching for the IRIs for those terms, and that PR doesn't come up in that search. Sorry, I jumped the gun on that one—please disregard my comment from earlier today. I'll do a more thorough search for edit history for those terms and come back with that info.

sebastianduesing commented 6 months ago

The information on QuickGO is helpful. Thank you, @addiehl, I was very wrong about this—I thought I was catching more in my GitHub issue/PR searches than I actually was. Lesson learned.

So we can't assume that these terms are stable in their source ontologies. Thinking again about the proposal to require the source ontologies to notify COB of impending changes, ~2 definition changes in 8 years is at least not frequent enough to make that very much of a burden on the source ontologies. If this is the approximate rate of change for terms like these, I think that that's still a reasonable option.