OBOFoundry / OBOFoundry.github.io

Metadata and website for the Open Bio Ontologies Foundry Ontology Registry
http://obofoundry.org
Other
161 stars 201 forks source link

Handling CHEBI 'definitions' #1802

Open nataled opened 2 years ago

nataled commented 2 years ago

This ticket is a follow-up to #1010 (which was closed because the automated check is already in place and functioning properly for nearly all ontologies).

Many years ago CHEBI was granted an exception to the rule specified by principle 6 that all terms must have a text definition. The exception was granted on the basis that in CHEBI it is the structure of the entity that serves as its definition in many cases. Unfortunately, this causes CHEBI to fail the principle 6 test in a somewhat grand fashion, with ~60% of classes lacking a text definition.

I believe it is possible to account for the exception by looking for specific tags within the OWL file. The following tags likely indicate a structure:

chebi:formula chebi:inchi chebi:inchikey chebi:smiles

(note: need to check relevancy for each). What I don't know is whether or not every term with, say, a SMILES string will always have an INCHI, etc. If one co-occurs with all others, it would possible to simplify the test to just look for one of the tags.

matentzn commented 2 years ago

Are you proposing to hardcode a few exceptions to do the "no definition rule" in ROBOT report? I am not entirely opposed to that, but I would kind of like to understand a bit better what exactly justifies this exception. This is not only about "is there some infomation that can be used to understand by a human what this entity is", but it is also supposed to direct tools reliably to "if you need a human understandable textual definition that tells you what this term means, look here". The latter requirement would not be serviced with this exception. However, I think it is a good idea to describe exceptions to the no-definition rule.

nataled commented 2 years ago

I am proposing to hardcode one exception to the rule for the long-known case of CHEBI. I'm not sure what your statement about tools means. CHEBI was one of the first ontologies granted full-fledged 'Foundry' status, back before we had an OBO Operations Committee, or the formal review process we currently have. At the time, it was noted that CHEBI lacked many text definitions. In rebuttal (so to speak), CHEBI developers pointed out that, for chemicals, the true 'definition' would be a structure, as text definitions would be, in many cases, not very meaningful to humans anyway. The exception was considered and granted, and likely all terms without text definitions were assumed to have at least a structure 'definition'. Here, I propose this exception so that the assumption can be put to the test, and allow finding of cases that lack both structural and textual definition. Right now, technically, CHEBI can simply ignore the definitions part of the dashboard in the same way that new ontologies can ignore the usages part of the dashboard; in neither case does the evaluation apply. With the proposed exception (or, more precisely, an extension) in place, the 'definitions' evaluation for CHEBI would again be meaningful.

bpeters42 commented 2 years ago

Can we make the annotation property 'has-molecular-structure-defined-by-inchie-string' a sub-property of 'definition'. Then technically Chebi terms do have definitions properties.

On Wed, Feb 16, 2022 at 5:32 AM Darren A. Natale @.***> wrote:

I am proposing to hardcode one exception to the rule for the long-known case of CHEBI. I'm not sure what your statement about tools means. CHEBI was one of the first ontologies granted full-fledged 'Foundry' status, back before we had an OBO Operations Committee, or the formal review process we currently have. At the time, it was noted that CHEBI lacked many text definitions. In rebuttal (so to speak), CHEBI developers pointed out that, for chemicals, the true 'definition' would be a structure, as text definitions would be, in many cases, not very meaningful to humans anyway. The exception was considered and granted, and likely all terms without text definitions were assumed to have at least a structure 'definition'. Here, I propose this exception so that the assumption can be put to the test, and allow finding of cases that lack both structural and textual definition. Right now, technically, CHEBI can simply ignore the definitions part of the dashboard in the same way that new ontologies can ignore the usages part of the dashboard; in neither case does the evaluation apply. With the proposed exception (or, more precisely, an extension) in place, the 'definitions' evaluation for CHEBI would again be meaningful.

— Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1802#issuecomment-1041495498, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IUHOC32CYOPHNTF3KLU3ORQBANCNFSM5OP4IKCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

nlharris commented 2 years ago

can this be moved to the dashboard tracker?

matentzn commented 2 years ago

If everyone agrees that has-molecular-structure-defined-by-inchie-string is a valid definition property, we can extend ROBOT report to recognise this.

nataled commented 2 years ago

The mention of 'dashboard' was for context. Edited the title of the ticket to prevent confusion.

cmungall commented 2 years ago

I’d like to revisit this. There shouldn’t be an awkward exception for chebi. And will chebi even use this AP?

On Tue, May 24, 2022 at 2:24 AM Nico Matentzoglu @.***> wrote:

If everyone agrees that has-molecular-structure-defined-by-inchie-string is a valid definition property, we can extend ROBOT report to recognise this.

— Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1802#issuecomment-1135640291, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOP7BKBMKVQMTOWMOWLVLSN4ZANCNFSM5OP4IKCA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

bpeters42 commented 2 years ago

The property Chebi is using is "http://purl.obolibrary.org/obo/chebi/inchi". I labeled it " has-molecular-structure-defined-by-inchie-string" as an explanation of how that can be considered a definition. It is not good that what they use is not even an RO or IAO property. But assume that we could get Chebi to change to what we want, what would you propose? Do you agree that an INCHI string is essentially a definition? Should we therefore not make it a sub-property of 'RO:definition' (or some abstraction thereof)? It seems that there could be multiple such cases, including an amino acid sequence via single letter code, a SMILES string and the like, and it seems that we would want to have different properties for such different encodings.

On Tue, May 24, 2022 at 7:12 AM Chris Mungall @.***> wrote:

I’d like to revisit this. There shouldn’t be an awkward exception for chebi. And will chebi even use this AP?

On Tue, May 24, 2022 at 2:24 AM Nico Matentzoglu @.***> wrote:

If everyone agrees that has-molecular-structure-defined-by-inchie-string is a valid definition property, we can extend ROBOT report to recognise this.

— Reply to this email directly, view it on GitHub < https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1802#issuecomment-1135640291 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAAMMOP7BKBMKVQMTOWMOWLVLSN4ZANCNFSM5OP4IKCA

. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/OBOFoundry/OBOFoundry.github.io/issues/1802#issuecomment-1135983596, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJX2IT4SYASBWK5OIIQ7CDVLTPWPANCNFSM5OP4IKCA . You are receiving this because you commented.Message ID: @.***>

-- Bjoern Peters Professor La Jolla Institute for Immunology 9420 Athena Circle La Jolla, CA 92037, USA Tel: 858/752-6914 Fax: 858/752-6987 http://www.liai.org/pages/faculty-peters

cmungall commented 2 years ago

If it were to inherit from http://purl.obolibrary.org/obo/IAO_0000115, as it would have to inherit its properties. Do we want that? Would we not want IAO_0000115 to have properties such as having a range of a piece of narrative text? (See https://github.com/information-artifact-ontology/ontology-metadata/issues/109)

I think the way to capture the intent of what you want might be:

(this is hard in OWL as it mixes AP and logical axioms)

Either way this seems like a lot of mechanism when the bigger issue here is that rules for ontologies don't make sense when applied to databases

bpeters42 commented 2 years ago

I don't see how this is directly tied to ontologies vs. databases. If something like "water molecule" is not a class, then I don't know what is. And I could see plenty of other 'properties that have definitional role' that could be used in ontologies, e.g. RGB ranges to define colors.

I like: property-that-has-definitional-role [abstract, do not instantiate]

I don't see why we would bring in 'logical definition' here if this is supposed to point to the OWL axioms. And I don't understand why this is a lot of mechanism? Maybe I am not understanding something. As an aside: I very much agree with your other issues you raised on fixing the IAO definition of definition etc. They are pretty embarrassing.

cmungall commented 2 years ago

I don't see how this is directly tied to ontologies vs. databases.

See slide 43 of databases-as-ontologies

Any time someone decides to treat database entities as ontology classes we will have the same issue. Definitions in the OBO sense don't make sense for: genes, the majority of chemical structures in structure space, taxa, domains, alleles, exons, etc. So we will have people either inserting non-useful autogenerated text to gameify dashboard checks and increase file sizes, or appeals to use property P or Q as an alternative (fasta sequence, phylocode, voucher specimens, GFF interval, HGVS nomenclature, HLA nomenclature, MOD-specific genotype nomenclatures, HMMs, etc).

I realize I am projecting a bit into the future seemingly hypothetically, but I do think getting this right upfront, I am sure this pattern will be repeated.

The reason I said this is a lot of mechanism is that OWL is not good for metadata and we end up rolling our own semantics for annotation properties. This includes things like cardinality checks propagating down the annotation property hierarchy. But I think you may have convinced me it's better just to accept this and try and do it right.

So how about this:

I like this and it's not so hard to implement - I tried this in my local copy of my OMO linkml rendering, and it tells me that ~1k chebi terms are in violation of definitional principles as EITHER text definitions OR inchis are shared (this is a known issue already - see https://github.com/ebi-chebi/ChEBI/issues/456#issuecomment-827694071 - but good to formalize according to OBO metadata), and that even when inchis are taken into account there are still a few thousand missing any kind of definitional property.

bpeters42 commented 2 years ago

It sounds like we agree on the way forward, and I like your proposed points. So let's focus on that?

The one thing I continue to disagree with is the distinction between database and ontology. It seems arbitrary to me, or maybe I just don't like the labels. But the way you state it here, you are essentially saying that if you can have a definitional property for a given domain that defines entities exactly without giving a textual definition, you consider it a 'database'. I would not use that label, but I do buy into that those domains are different from others where we rely on textual definitions. And as I said above, with the practical consequences of what you are proposing - I would just replace 'ontological' vs. 'databasy' with 'defined textually' vs. 'defined based on non-textual-classication-scheme.