Closed cmungall closed 2 years ago
Last year, the Glygen database submitted around 8,000 glycans into ChEBI, some of these were partially defined glycans where you only know the composition of the glycan and the stereochemistry and connectivity is unknown. Therefore it was difficult to name such compounds, especially when the structure is not properly defined, therefore we decided to use the GlyTouCan identifiers as the ChEBI name for these structures.
The WURCS identifier was also provided as a synonym for these glycans since our current infrastructure does not allow WURCS to be added to CHEBI. This is something we will need to fix when we redevelop the ageing ChEBI infrastructure in the next few years.
It should be a simple post-processing step to the owl to fix the WURCS ID - I can contribute code if you like
https://www.ebi.ac.uk/ols/ontologies/chebi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FCHEBI_146251
the WURCS string should be treated like inchi strings - don't overload synonym, use a new field.
(if you would like to collaborate on the IRIs for the properties I am working on a schema here: https://cmungall.github.io/chem-schema/wurcs_representation.html)
I also think the labels should be something other than "GlyTouCan IDENTIFIER", but I don't have any suggestions, other than not including these in CHEBI in the first place. It's not clear why a subset of 69 terms have been added.
Maybe provide some easy way for people to filter these computationally, to say they have not been fully curated