ebi-chebi / ChEBI

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
https://www.ebi.ac.uk/chebi
Creative Commons Attribution 4.0 International
39 stars 10 forks source link

Duplicate inchis for enantiomers #4080

Open cmungall opened 3 years ago

cmungall commented 3 years ago

As noted in #456 there are many cases of 2 chebis having the same inchi

One way to tackle this is to break down into cases where we can use the structure of the inchi. For example, these are cases where we have an enantiomer and its parent form sharing an inchi. In these cases it's clear the enantiomer and its parent should differ predictably via appending the inchi stereochemistry layer

class name parent parent_name config inchi parent_inchi
http://purl.obolibrary.org/obo/CHEBI_38349 (7R)-7-[(2,4-diamino-6-methylpyrimidin-5-yl)methyl]-7,8-dicarba-nido-undecaborane(11) http://purl.obolibrary.org/obo/CHEBI_38348 7-[(2,4-diamino-6-methylpyrimidin-5-yl)methyl]-7,8-dicarba-nido-undecaborane(11) (7R) InChI=1S/C8H19B9N4/c1-3-4(5(18)21-6(19)20-3)2-8-7-9-11-10(8)14(8)12(7,8)13(7,9)15(9,11)16(10,11,14)17(12,13,14)15/h7,9-17H,2H2,1H3,(H4,18,19,20,21) InChI=1S/C8H19B9N4/c1-3-4(5(18)21-6(19)20-3)2-8-7-9-11-10(8)14(8)12(7,8)13(7,9)15(9,11)16(10,11,14)17(12,13,14)15/h7,9-17H,2H2,1H3,(H4,18,19,20,21)
http://purl.obolibrary.org/obo/CHEBI_38350 (7S)-7-[(2,4-diamino-6-methylpyrimidin-5-yl)methyl]-7,8-dicarba-nido-undecaborane(11) http://purl.obolibrary.org/obo/CHEBI_38348 7-[(2,4-diamino-6-methylpyrimidin-5-yl)methyl]-7,8-dicarba-nido-undecaborane(11) (7S) InChI=1S/C8H19B9N4/c1-3-4(5(18)21-6(19)20-3)2-8-7-9-11-10(8)14(8)12(7,8)13(7,9)15(9,11)16(10,11,14)17(12,13,14)15/h7,9-17H,2H2,1H3,(H4,18,19,20,21) InChI=1S/C8H19B9N4/c1-3-4(5(18)21-6(19)20-3)2-8-7-9-11-10(8)14(8)12(7,8)13(7,9)15(9,11)16(10,11,14)17(12,13,14)15/h7,9-17H,2H2,1H3,(H4,18,19,20,21)
http://purl.obolibrary.org/obo/CHEBI_48930 (2R,3S)-3-carboxy-2,3-dihydroxypropanoate http://purl.obolibrary.org/obo/CHEBI_35400 meso-tartrate(1-) (2R,3S) InChI=1S/C4H6O6/c5-1(3(7)8)2(6)4(9)10/h1-2,5-6H,(H,7,8)(H,9,10)/p-1/t1-,2+ InChI=1S/C4H6O6/c5-1(3(7)8)2(6)4(9)10/h1-2,5-6H,(H,7,8)(H,9,10)/p-1/t1-,2+
http://purl.obolibrary.org/obo/CHEBI_48931 (2S,3R)-3-carboxy-2,3-dihydroxypropanoate http://purl.obolibrary.org/obo/CHEBI_35400 meso-tartrate(1-) (2S,3R) InChI=1S/C4H6O6/c5-1(3(7)8)2(6)4(9)10/h1-2,5-6H,(H,7,8)(H,9,10)/p-1/t1-,2+ InChI=1S/C4H6O6/c5-1(3(7)8)2(6)4(9)10/h1-2,5-6H,(H,7,8)(H,9,10)/p-1/t1-,2+
http://purl.obolibrary.org/obo/CHEBI_60088 3-[(5R,6S)-5,6-dihydroxycyclohexa-1,3-dienyl]propanoate http://purl.obolibrary.org/obo/CHEBI_60087 3-(cis-5,6-dihydroxycyclohexa-1,3-dienyl)propanoate 3 InChI=1S/C9H12O4/c10-7-3-1-2-6(9(7)13)4-5-8(11)12/h1-3,7,9-10,13H,4-5H2,(H,11,12)/p-1/t7-,9+/m1/s1 InChI=1S/C9H12O4/c10-7-3-1-2-6(9(7)13)4-5-8(11)12/h1-3,7,9-10,13H,4-5H2,(H,11,12)/p-1/t7-,9+/m1/s1
http://purl.obolibrary.org/obo/CHEBI_138262 (9S,10R)-epoxyoctadecanoic acid http://purl.obolibrary.org/obo/CHEBI_82464 cis-9,10-epoxyoctadecanoic acid (9S,10R) InChI=1S/C18H34O3/c1-2-3-4-5-7-10-13-16-17(21-16)14-11-8-6-9-12-15-18(19)20/h16-17H,2-15H2,1H3,(H,19,20)/t16-,17+/m1/s1 InChI=1S/C18H34O3/c1-2-3-4-5-7-10-13-16-17(21-16)14-11-8-6-9-12-15-18(19)20/h16-17H,2-15H2,1H3,(H,19,20)/t16-,17+/m1/s1
http://purl.obolibrary.org/obo/CHEBI_31018 (+)-gallocatechin http://purl.obolibrary.org/obo/CHEBI_68330 gallocatechin (+) InChI=1S/C15H14O7/c16-7-3-9(17)8-5-12(20)15(22-13(8)4-7)6-1-10(18)14(21)11(19)2-6/h1-4,12,15-21H,5H2/t12-,15+/m0/s1 InChI=1S/C15H14O7/c16-7-3-9(17)8-5-12(20)15(22-13(8)4-7)6-1-10(18)14(21)11(19)2-6/h1-4,12,15-21H,5H2/t12-,15+/m0/s1
http://purl.obolibrary.org/obo/CHEBI_36410 Delta-tris(1,10-phenanthroline)ruthenium(2+) http://purl.obolibrary.org/obo/CHEBI_36409 tris(1,10-phenanthroline)ruthenium(2+) Delta InChI=1S/3C12H8N2.Ru/c31-3-9-5-6-10-4-2-8-14-12(10)11(9)13-7-1;/h31-8H;/q;;;+2 InChI=1S/3C12H8N2.Ru/c31-3-9-5-6-10-4-2-8-14-12(10)11(9)13-7-1;/h31-8H;/q;;;+2
http://purl.obolibrary.org/obo/CHEBI_36411 Lambda-tris(1,10-phenanthroline)ruthenium(2+) http://purl.obolibrary.org/obo/CHEBI_36409 tris(1,10-phenanthroline)ruthenium(2+) Lambda InChI=1S/3C12H8N2.Ru/c31-3-9-5-6-10-4-2-8-14-12(10)11(9)13-7-1;/h31-8H;/q;;;+2 InChI=1S/3C12H8N2.Ru/c31-3-9-5-6-10-4-2-8-14-12(10)11(9)13-7-1;/h31-8H;/q;;;+2
http://purl.obolibrary.org/obo/CHEBI_83816 (10R,11R)-dihydroxy-10,11-dihydrocarbamazepine http://purl.obolibrary.org/obo/CHEBI_83532 10,11-trans-dihydroxy-10,11-dihydrocarbamazepine (10R,11R) InChI=1S/C15H14N2O3/c16-15(20)17-11-7-3-1-5-9(11)13(18)14(19)10-6-2-4-8-12(10)17/h1-8,13-14,18-19H,(H2,16,20)/t13-,14-/m1/s1 InChI=1S/C15H14N2O3/c16-15(20)17-11-7-3-1-5-9(11)13(18)14(19)10-6-2-4-8-12(10)17/h1-8,13-14,18-19H,(H2,16,20)/t13-,14-/m1/s1
amalik01 commented 3 years ago

Hi Chris,

We are aware of this issue. The duplicate InChI arises as a result of structures that are relative configurations where there are potentially two possible configurations of the structure. Since we are unable to show both structures on the same ChEBI entry, we therefore tend to show one of them for depiction purposes but the IUPAC name will have the word 'rel' in front of it so that the user knows that it is the relative configuration of the structure (which means that its either the structure shown or its opposite configuration which is not shown). In an ideal world, it be good to show both structures on a single ChEBI entry and show the InChi of the relative configuration but it's the best we can do at the moment based on the technology we have, otherwise we would have to delete all of the structures of all the relative configurations in ChEBI.

K-r-ll commented 3 years ago

The particular problem of both Λ-tris(1,10-phenanthroline)ruthenium(2+) (CHEBI:36411) and its enantiomer Δ-tris(1,10-phenanthroline)ruthenium(2+) (CHEBI:36410) having the same InChI is due to the fact that InChI algorithm yet does not differentiate between non-tetrahedral (in this case, octahedral) stereoisomers. Hopefully this will be eventually sorted out if/when the InChI “coordination layer” is implemented. See e.g. https://cheminf20.org/2020/10/18/coordination-inchi-for-inorganics-now-with-stereochemistry/

rwst commented 3 years ago

See also https://chemistry.stackexchange.com/questions/151072/can-cis-trans-isomers-have-same-inchi

cmungall commented 1 year ago

It looks like this was "closed as completed" but the problem still remains?

amalik01 commented 1 year ago

At present, this issue cannot be resolved since our curator tool is only able to generate standard InChI's. We will need to include stereo options in the curator tool to generate InChI's for absolute, relative, undefined and racemic stereocenters.

cmungall commented 1 year ago

Some solutions:

On Wed, Aug 31, 2022 at 6:26 AM amalik01 @.***> wrote:

At present, this issue cannot be resolved since our curator tool is only able to generate standard InChI's. We will need to include options in the curator tool to generate InChI's for absolute, relative, undefined and relative stereocenters.

— Reply to this email directly, view it on GitHub https://github.com/ebi-chebi/ChEBI/issues/4080#issuecomment-1232937573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOIL3CFPR4JENU7WB43V35MRTANCNFSM43VJEIHQ . You are receiving this because you authored the thread.Message ID: @.***>

cmungall commented 1 year ago

If none of these are possible and you don't want the issue clogging things up, I suggest you "close as not planned" rather than "close as completed"

On Wed, Aug 31, 2022 at 7:24 AM Chris Mungall @.***> wrote:

Some solutions:

  • include a mapping predicate that states the relationship between the CHEBI term and the INCHI (exact, broad, narrow, related)
  • simply exclude non-exact INCHI mappings, instead make sure there is an equivalent term, allowing the user to infer the precise relationship from any CHEBI term

On Wed, Aug 31, 2022 at 6:26 AM amalik01 @.***> wrote:

At present, this issue cannot be resolved since our curator tool is only able to generate standard InChI's. We will need to include options in the curator tool to generate InChI's for absolute, relative, undefined and relative stereocenters.

— Reply to this email directly, view it on GitHub https://github.com/ebi-chebi/ChEBI/issues/4080#issuecomment-1232937573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOIL3CFPR4JENU7WB43V35MRTANCNFSM43VJEIHQ . You are receiving this because you authored the thread.Message ID: @.***>