ebi-chebi / ChEBI

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
https://www.ebi.ac.uk/chebi
Creative Commons Attribution 4.0 International
39 stars 10 forks source link

molasses is classified as both a mixture and a molecular entity (also: should these classes be in ENVO or CHEBI)? #3293

Open cmungall opened 7 years ago

cmungall commented 7 years ago

This seems odd:

image

I recommend keeping molasses and the like out of the molecular entity branch. The relationship to carbohydrate should be has_part or a sub-relation. In fact you have this already, indirectly:

[Term]
id: CHEBI:83163
name: molasses
namespace: chebi_ontology
def: "A mixture of sugar (fructose, glucose, sucrose) and carbohydrate components that is a by-product from sugar refinery." []
xref: Wikipedia:Molasses
is_a: CHEBI:16646  ! carbohydrate
is_a: CHEBI:60004  ! mixture
relationship: has_part CHEBI:17234 ! glucose
relationship: has_part CHEBI:17992 ! sucrose
relationship: has_part CHEBI:28757 ! fructose
relationship: has_role CHEBI:35617 ! flavouring agent

Additionally, we have a mixture branch in ENVO, we would like to coordinate with you on where mixtures live, see: https://github.com/EnvironmentOntology/envo/issues/456

For mixtures that are foods, ENVO is ceding terms to the new food ontology, http://obofoundry.org/ontology/foodon.html, see also http://purl.obolibrary.org/obo/FOODON_03460156

What is CHEBI's long term plan with mixtures, and for foods particularly? Are these added on an as-needed basis? Can you import foodon instead? Note that foodon includes has_part links to CHEBI.

I'll be at biocuration if anyone wants to discuss

cc @pbuttigieg @mateolan @Public-Health-Bioinformatics

cmungall commented 7 years ago

FYI these are other dual classifications; lots of drugs

G-Owen commented 7 years ago

Hi Chris, No, we shouldn't have classed molasses as is_a carbohydrate! Now corrected.

Thank-you for the list of dual classifications. There are some clear errors (e.g. use of an is_a relationship instead of has_part), but quite a few arise from the use of terms that combine a structure with a role. E.g. the racemic drug climbazole is described in the literature as a member of the class of conazole antifungal drugs, so we have a term 'conazole antifungal drug', which is classified as is_a azole and has_role antifungal drug. The problem is that we don't know which enantiomer of climbazole is the active antifungal drug - it could be either (or both) - so we can't assign the term at the individual compound level. So we assigned it to the racemate, which we know has antifungal properties, so that users searching for all of the conazole antifungal drugs wouldn't miss out on climbazole simply because it is used as a racemate!

The Mixtures branch of the ChEBI ontology was introduced so that we could cope with racemates, polymers, natural antibiotic isomers (milbemycin, etc.) and so on. I'm not sure why molasses was added to ChEBI - maybe it was used to feed some fungus that produced an interesting byproduct! We don't have the luxury of being able to add whatever we might like to ChEBI - all of our curation capability is tied up filling user requests and submissions - but we only add to the mixtures in ChEBI if someone asks for an entry that is a mixture. We are quite happy to add mixtures that are of biological relevance, but we would not wish to become a repository for (say) the brand names and components of all commercially available perfumes!

Thank-you for the information about FoodOn. Molasses apart (!), our users tend to be not concerned with foods, but are interested in chemical entities, so we have the various forms of the vitamins as well as classes of food additives (generally the EC classification) and some of the members of those classes.

Best regards, Gareth

G-Owen commented 7 years ago

Hi Chris, I should have added in my earlier reply that, unfortunately, producing the ChEBI ontology from the ChEBI database is a one-way process - it is created by export of appropriate fields from the ChEBI database. The ChEBI schema is such that we have no way of importing other ontologies (or branches of them) into ChEBI.

Kind regards, Gareth

cmungall commented 7 years ago

Thanks @G-Owen

In the OBO Foundry we prefer if there is a clear place where different classes live based on their domain. As stated in EnvironmentOntology/envo#456 we'd be open to ceding all ENVO 'materials' to CHEBI. Here are some high level ones:

We'd have to work out various details. We may want to axiomatize these in a way that is not possible with the CHEBI database structure, but we can do that externally. We would want to keep the hierarchy relatively intact.

However, I sense that while this isn't going down to the level of individual perfume products, this is not really something CHEBI wants to take on? In many cases the biological relevance is indirect (or not even present, as ENVO is the bridge for different domains).

So is the way forward for us to maintain equivalence axioms, create reports, and provide ways for OBO users to get non-redundant unions of the two ontologies (either by merging equivalence cliques or filtering)? It's not ideal from an OBO perspective but may be most pragmatic.

It's unfortunate that CHEBI cannot import the way other OBOs can. Is this something that could be changed in the future?

pbuttigieg commented 7 years ago

@cmungall @G-Owen Thanks for the discussion - it would be great to coordinate more tightly here.

As raised in https://github.com/EnvironmentOntology/envo/issues/456#issuecomment-272857566 and https://github.com/EnvironmentOntology/envo/issues/456#issuecomment-289825261, ENVO's main concern is that our material terms (including mixtures) should be a bit open ended in terms of their composition. When we say "water" we mean "mostly CHEBI:water with some other stuff in there because ecosystems are rather messy".

As far as I can tell, CHEBI is more concerned with the more "pure" substances, even when dealing with mixtures. If this is the case, we may not need equivalence axioms, but some sort of mirroring of mixtures ("clean" mixtures in CHEBI, "wild" mixtures in ENVO) informing users how and when to opt for one or the other. Of course, we don't intend to duplicate CHEBI, but focus more on substances that are sampled by field scientists.

cmungall commented 7 years ago

@pbuttigieg what about this one?

 / CHEBI:24431 ! chemical entity
  is_a CHEBI:23367 ! molecular entity
   is_a CHEBI:33259 ! elemental molecular entity
    is_a CHEBI:33415 ! elemental carbon
     is_a CHEBI:82297 ! carbon black *** 
   is_a CHEBI:33579 ! main group molecular entity
    is_a CHEBI:33675 ! p-block molecular entity
     is_a CHEBI:33582 ! carbon group molecular entity
      is_a CHEBI:50860 ! organic molecular entity
       is_a CHEBI:33415 ! elemental carbon
        is_a CHEBI:82297 ! carbon black *** 

Would it make sense to have 'pure' in CHEBI and 'wild' in ENVO? Or maybe this is mis-classified in CHEBI and should be a mixture?

cmungall commented 1 year ago

it looks like the original issues has been addressed:

image

however, there are still 155 that have dual classification as mixtures and molecular entities. I strongly recommend CHEBI adopts disjointness axioms and use of ROBOT as part of the release process.

This can be demonstrated with an OAK query:

runoak -i sqlite:obo:chebi info .desc//p=i mixture .and .desc//p=i 'molecular entity'