OBOFoundry / OBOFoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry
http://obofoundry.org
Other
164 stars 201 forks source link

Define when it is OK to subclass terms in another ontology #1991

Open cmungall opened 2 years ago

cmungall commented 2 years ago

This is a companion issue to:

But that issue focuses on injection which I define as adding axioms about terms another ontology (this is clearly defined in that issue, don't bring the discussion back here)

This issue is about when it is OK to make axioms that are not about terms in another ontology but that reference them in subClassOf axioms, in particular subClassOf between named classes.

On the surface this should be OK - I am an not altering the target ontology axioms in any way. Indeed some ontologies such as COB and BFO and CARO are designed expressly with the intention they are subclassed. To a certain extent uberon is too, although only for species-specific subclasses.

However, subclassing others ontologies is rampant in OBO, and this is actually harmful. It is poor modularity and it leads to confusion about scope. Users are not clear which ontology to go to get a term or to request a term.

It is also terrible for maintainability. If I maintain an ontology O1, containing class C1, and another ontology O2 starts makes subclasses, C1a, C1b, and so on. Then if I later need to introduce subclasses in O1, I need to first scan all OBO to see who has made subclasses and coordinate with these ontologies. This places a large impediment for maintainability.

Here is an example of what I call a heavily chequered inter-ontology subclass pattern, where there is a lack of clarity (to an external user about what belongs in STATO, OBI, or IAO):

subject predicate object subject_label predicate_label object_label
STATO:0000002 rdfs:subClassOf IAO:0000030 digital file subClassOf information content entity
STATO:0000003 rdfs:subClassOf OBI:0500000 balanced design subClassOf study design
STATO:0000005 rdfs:subClassOf OBI:0500000 single factor design subClassOf study design
STATO:0000007 rdfs:subClassOf IAO:0000573 axis subClassOf line graph
STATO:0000010 rdfs:subClassOf IAO:0000030 coordinate system subClassOf information content entity
STATO:0000026 rdfs:subClassOf IAO:0000400 cartesian spatial coordinate origin subClassOf cartesian spatial coordinate datum
STATO:0000027 rdfs:subClassOf OBI:0000673 test of association between categorical variables subClassOf statistical hypothesis test
STATO:0000028 rdfs:subClassOf IAO:0000109 measure of variation subClassOf measurement datum
STATO:0000029 rdfs:subClassOf IAO:0000109 measure of central tendency subClassOf measurement datum
STATO:0000031 rdfs:subClassOf OBI:0200000 binary classification subClassOf data transformation
STATO:0000034 rdfs:subClassOf IAO:0000027 model parameter subClassOf data item
STATO:0000036 rdfs:subClassOf IAO:0000027 outlier subClassOf data item
STATO:0000038 rdfs:subClassOf OBI:0000181 matched pair of subjects subClassOf population
STATO:0000039 rdfs:subClassOf IAO:0000109 statistic subClassOf measurement datum
STATO:0000040 rdfs:subClassOf IAO:0000184 MA plot subClassOf scatter plot
STATO:0000044 rdfs:subClassOf OBI:0200201 one-way ANOVA subClassOf ANOVA
STATO:0000045 rdfs:subClassOf OBI:0200201 two-way ANOVA subClassOf ANOVA
STATO:0000046 rdfs:subClassOf OBI:0500000 block design subClassOf study design
STATO:0000047 rdfs:subClassOf IAO:0000109 count subClassOf measurement datum
STATO:0000048 rdfs:subClassOf OBI:0200201 multiway ANOVA subClassOf ANOVA
STATO:0000063 rdfs:subClassOf IAO:0000027 genomic coordinate datum subClassOf data item
STATO:0000065 rdfs:subClassOf IAO:0000030 hypothesis subClassOf information content entity
STATO:0000066 rdfs:subClassOf IAO:0000037 Cleveland dot plot subClassOf dot plot
STATO:0000068 rdfs:subClassOf IAO:0000027 skewness subClassOf data item

(truncated)

To replicate with OAK:

stato roots -p i --id-prefix STATO | stato relationships - -p i

Proposal:

Ontologies MUST NOT create is-a children of classes in other ontologies in their own ontology, unless permission explicitly granted, on a per-term, per-branch, or per-ontology basis. This would be recorded in OBO metadata, e.g. for COB, BFO, CARO. OBI could choose to grant permission in this way, preferably with a link to some kind of documentation that states the relative scope of the two ontologies.

cmungall commented 2 years ago

Here is a visual illustration of the problem:

stato-obi-iao

I'm not sure how IAO/STATO/OBI coordinate which term goes where, but this is very confusing for a user who either needs to select terms, even more so if they need to figure out which issue tracker to go to in order to select new terms

matentzn commented 2 years ago

I not only like this, I think it is very necessary and already reflected by the "Scope" principle (which is not very well fleshed out right now, https://obofoundry.org/principles/fp-005-delineated-content.html). This is how I would like to attack it:

  1. All major branches are reflected in COB (data transformation, study design, measurement datum, disease, anatomical entity etc). COB metadata points (maps) to all branches in active OBO ontologies, which establishes the ontologies which have theoretical permission to host terms. For example, DO, NCIT, Mondo disease branches point to COB:disease and can all serve as hosts for new terms for now. (We probably have to document all current violations as exceptions for the time being and work them out one by one (think OMIT/BTO classes and application ontologies). )
  2. We implement the rule you suggest (MUST NOT subclass), and add it to OBO dashboard.
  3. From that point on, subclassing a term from a different namespace (other than COB, RO, BFO) can only happen with a specific annotation property (like exclusion reason, but "subclass permission") which points to a resolvable issue tracker items that explains the exception.

It is important to implement this rule independent of all existing violations. We have to improve this moving forward and not forever point to existing violations as reasons for not moving on.

dosumis commented 2 years ago

Unless permission explicitly granted, on a per-term, per-branch, or per-ontology basis.

This is critical. PCL defines subclasses of CL terms. Single species AOs subclass Uberon and CL...

Big ask to require this for every subClassOf axiom that breaks the rule:

From that point on, subclassing a term from a different namespace (other than COB, RO, BFO) can only happen with a specific annotation property (like exclusion reason, but "subclass permission") which points to a resolvable issue tracker items that explains the exception.

dosumis commented 2 years ago

And why are we folding COB into this issue? Isn't point 1 above more aspiration than reality for many ontology branches? (e.g. see issues around anatomical entities)

hoganwr commented 2 years ago

IAO would either have to be very permissive and/or grow to include domain-specific ICEs across numerous domains.

If the policy had been in place prior to STATO, how would things be better?

On Mon, Jul 18, 2022 at 3:01 PM David Osumi-Sutherland < @.***> wrote:

And why are we folding COB into this issue? Isn't point 1 above more aspiration than reality for many ontology branches? (e.g. see issues around anatomical entities)

— Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1991#issuecomment-1188144461, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJR55R5YKF2VDR4YKUWOVTVUWSZ7ANCNFSM534YNAFA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

cmungall commented 2 years ago

Unless permission explicitly granted, on a per-term, per-branch, or per-ontology basis. This is critical. PCL defines subclasses of CL terms. Single species AOs subclass Uberon and CL...

Yes, I mentioned the Uberon case in the original comment. There would be an agreement that species-specific subclasses are OK by a blanket rule, but if you want to make a species-neutral subclass this should be agreed first. PCL and CL is a good example, there is obviously close coordination and clear scoping rules between these two ontologies. So there would be a pairwise agreement. But I don't think CL wants extra-ontology subclasses that are neither data-driven classifications not species-specific, until new situations arise.

@hoganwr:

IAO would either have to be very permissive

There is nothing inherently wrong with this provided there is a simple process for adding new terms, for example, template-based with clear design patterns, and many people able to merge PRs. But see below for alternatives.

If the policy had been in place prior to STATO, how would things be better?

There would be clear delineation between the two ontologies. There's lots of ways to do this:

But simply having IAO have all information coupled with a simple process for adding new terms would be better than the current situation, with the striping between ontologies.

I am aware of some reasons why the current situation arose, I am not criticizing past decisions, but we need to move beyond these and implement clear modularity and scoping.

alanruttenberg commented 1 year ago

Just became aware of this issue. I'll register a strong objection. Let's quit proposing rules that limit what developers can put in their ontology.

addiehl commented 1 year ago

Have to agree with Alan, as most of my group's ontologies build off other ontologies. In some cases we have requested new classes from appropriate ontologies, but in other cases our classes are probably too specific for inclusion in a higher level domain ontology.

cmungall commented 1 year ago

@addiehl can you describe some of the processes you have put in place to avoid some of the issues highlighted here? It would be great to have documentation and SOPs on this and very much in the spirit of my original request!

addiehl commented 1 year ago

I have a number of examples to describe, but don't have time until next week to write this up.