chembl / libRDChEBI

MIT License
0 stars 1 forks source link

CHEBI:53020 #8

Open amalik01 opened 9 months ago

amalik01 commented 9 months ago

https://wwwdev.ebi.ac.uk/chebi/beta/CHEBI:53020

It seems to add the number 2 in front of the formula which needs fixing.

The Mass, InChI and InChIKey calculated are different to the ones generated by ACD which is unusual.

eloyfelix commented 9 months ago

for some reason the S GROUP is detected twice, we'll need to look at it so it is probably the reason why molecular formula has the 2 in front and the weights also differ from ACD calculations. I expect ACD show smaller values?

we'll need to check on that

InChI and InChI key values are the ones provided by rdkit by directly calling the InChI software so that's much tricker.

amalik01 commented 9 months ago

The formula and mass generated by ACD is: image

The InChI and InChIKey is :

InChI=1S/C87H150N2O72P2.C20H34/c1-20(2)5-6-136-162(130,131)161-163(132,133)160-76-37(89-22(4)102)48(113)65(31(15-98)145-76)150-75-36(88-21(3)101)47(112)66(30(14-97)144-75)151-85-64(129)72(157-86-74(53(118)42(107)26(10-93)142-86)159-87-73(52(117)41(106)27(11-94)143-87)158-84-63(128)70(44(109)29(13-96)141-84)156-83-62(127)69(43(108)28(12-95)140-83)154-81-58(123)51(116)40(105)25(9-92)139-81)46(111)35(149-85)19-135-78-61(126)71(155-82-60(125)55(120)68(33(17-100)147-82)153-80-57(122)50(115)39(104)24(8-91)138-80)45(110)34(148-78)18-134-77-59(124)54(119)67(32(16-99)146-77)152-79-56(121)49(114)38(103)23(7-90)137-79;1-7-18(4)12-9-14-20(6)16-10-15-19(5)13-8-11-17(2)3/h20,23-87,90-100,103-129H,5-19H2,1-4H3,(H,88,101)(H,89,102)(H,130,131)(H,132,133);7,11,14-15H,8-10,12-13,16H2,1-6H3/b;18-7+,19-15+,20-14+/t23-,24-,25-,26-,27-,28-,29-,30-,31-,32-,33-,34-,35-,36-,37-,38-,39-,40-,41-,42-,43-,44-,45-,46-,47-,48-,49+,50+,51+,52+,53+,54-,55-,56+,57+,58-,59+,60+,61+,62-,63+,64+,65-,66-,67-,68-,69+,70+,71+,72+,73+,74+,75+,76-,77+,78+,79-,80-,81-,82-,83-,84-,85+,86-,87-;/m1./s1

DSXMJBFOHIPMOF-DSJWLFSWSA-N

eloyfelix commented 9 months ago

do you know which version of inchi is ACD using?

amalik01 commented 9 months ago

I think it is version 1.05

image

eloyfelix commented 7 months ago

more potential molecules with issues: 37633 51133 51133 53334 53019 53020 53022 53023 53325 53398 28427 81539 81539 84166 141517 81539 53571 53498 53723 53742 59081 59085

all seem to have multiple nested SGROUPS (with SPL, parent list). We should maybe not calculate them until we can find a fix. Some of the structures may be incorrect as well.

eloyfelix commented 7 months ago

using index and PARENT properties in SGROUPS may give enough infromation to calculate them:

for sg in Chem.GetMolSubstanceGroups(mol):
    print(sg.GetPropsAsDict())
eloyfelix commented 7 months ago

now fixed in the code