ebi-chebi / ChEBI

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
https://www.ebi.ac.uk/chebi
Creative Commons Attribution 4.0 International
39 stars 10 forks source link

Some InChi are not unique - release 55 #456

Open muthuvenkat opened 15 years ago

muthuvenkat commented 15 years ago

Based on chebi.obo (release 55), 55 InChi are not unique.

In some cases, there is a problem in InChi computation. Ex: CHEBI:30504 ; CHEBI:33787 ; CHEBI:30501 | InChI=1/Be.2H GDP-6-deoxy-L-mannose (CHEBI:27886) and GDP-L-mannose (CHEBI:21164)

I noticed also compounds present in the OBO file but not available from the web site ??? Ex: CHEBI:38348 ; CHEBI:38349 ; CHEBI:38350

The complete list is attached to this ticket.

Regards, Anne

Reported by: morgat

muthuvenkat commented 15 years ago

InChi linked to multiple IDs - chebi.obo rel55

Original comment by: morgat

muthuvenkat commented 15 years ago

Hi Anne

Thanks for this. I've taken a quick look at your list, some of which are real problems and others of which are not.

The beryliium example that you cite had an error and I see that Kirill has made a chenge here earlier this morning. Some of the duplications with the other elements are due to ChEBI having separate entries for an element and its existence as a monoatom in oxidation state zero. E.g we have separete entries for He and He(0), both of which generate the same InChI. I don't think this is a problem (if Kirill sees this update, perhaps he could comment further?).

The GDP-6-deoxy-L-mannose/GDP-L-mannose problem was our error (both entries had the same structure) and I have bnow corrected this.

38348, 38349 and 38350 are all unchecked entries (which is why they are not directly available on the web) but which have been classified (which is why they are in the OBO file). Nothing wrong here.

Of the others on your list, 20486 and 15717 were indeed identical except for the presence of a pair of brackets in the names and I have now merged these.

Another area where duplicates are showing up is where the InChI generator doesn't always seem to be properly recognising how we are depicting lack of stereochemistry about double bonds I shall look further into this.

Thanks very much for your input.

Cheers Marcus

Original comment by: mennis

adekker2 commented 8 years ago

inchi_multiple_id.txt.zip

cmungall commented 3 years ago

It looks like this problem is still not fixed? There are lots of duplicate inchis