ebi-chebi / ChEBI

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
https://www.ebi.ac.uk/chebi
Creative Commons Attribution 4.0 International
43 stars 10 forks source link

glytoucan derived entries are confusing - WURCS string should not be treated as a synonym #4152

Closed cmungall closed 2 years ago

cmungall commented 2 years ago

https://www.ebi.ac.uk/ols/ontologies/chebi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCHEBI_146251

image

the WURCS string should be treated like inchi strings - don't overload synonym, use a new field.

(if you would like to collaborate on the IRIs for the properties I am working on a schema here: https://cmungall.github.io/chem-schema/wurcs_representation.html)

I also think the labels should be something other than "GlyTouCan IDENTIFIER", but I don't have any suggestions, other than not including these in CHEBI in the first place. It's not clear why a subset of 69 terms have been added.

Maybe provide some easy way for people to filter these computationally, to say they have not been fully curated

amalik01 commented 2 years ago

Last year, the Glygen database submitted around 8,000 glycans into ChEBI, some of these were partially defined glycans where you only know the composition of the glycan and the stereochemistry and connectivity is unknown. Therefore it was difficult to name such compounds, especially when the structure is not properly defined, therefore we decided to use the GlyTouCan identifiers as the ChEBI name for these structures.

The WURCS identifier was also provided as a synonym for these glycans since our current infrastructure does not allow WURCS to be added to CHEBI. This is something we will need to fix when we redevelop the ageing ChEBI infrastructure in the next few years.

cmungall commented 2 years ago

It should be a simple post-processing step to the owl to fix the WURCS ID - I can contribute code if you like