Open amalik01 opened 9 months ago
for some reason the S GROUP is detected twice, we'll need to look at it so it is probably the reason why molecular formula has the 2 in front and the weights also differ from ACD calculations. I expect ACD show smaller values?
we'll need to check on that
InChI and InChI key values are the ones provided by rdkit by directly calling the InChI software so that's much tricker.
The formula and mass generated by ACD is:
The InChI and InChIKey is :
InChI=1S/C87H150N2O72P2.C20H34/c1-20(2)5-6-136-162(130,131)161-163(132,133)160-76-37(89-22(4)102)48(113)65(31(15-98)145-76)150-75-36(88-21(3)101)47(112)66(30(14-97)144-75)151-85-64(129)72(157-86-74(53(118)42(107)26(10-93)142-86)159-87-73(52(117)41(106)27(11-94)143-87)158-84-63(128)70(44(109)29(13-96)141-84)156-83-62(127)69(43(108)28(12-95)140-83)154-81-58(123)51(116)40(105)25(9-92)139-81)46(111)35(149-85)19-135-78-61(126)71(155-82-60(125)55(120)68(33(17-100)147-82)153-80-57(122)50(115)39(104)24(8-91)138-80)45(110)34(148-78)18-134-77-59(124)54(119)67(32(16-99)146-77)152-79-56(121)49(114)38(103)23(7-90)137-79;1-7-18(4)12-9-14-20(6)16-10-15-19(5)13-8-11-17(2)3/h20,23-87,90-100,103-129H,5-19H2,1-4H3,(H,88,101)(H,89,102)(H,130,131)(H,132,133);7,11,14-15H,8-10,12-13,16H2,1-6H3/b;18-7+,19-15+,20-14+/t23-,24-,25-,26-,27-,28-,29-,30-,31-,32-,33-,34-,35-,36-,37-,38-,39-,40-,41-,42-,43-,44-,45-,46-,47-,48-,49+,50+,51+,52+,53+,54-,55-,56+,57+,58-,59+,60+,61+,62-,63+,64+,65-,66-,67-,68-,69+,70+,71+,72+,73+,74+,75+,76-,77+,78+,79-,80-,81-,82-,83-,84-,85+,86-,87-;/m1./s1
DSXMJBFOHIPMOF-DSJWLFSWSA-N
do you know which version of inchi is ACD using?
I think it is version 1.05
more potential molecules with issues: 37633 51133 51133 53334 53019 53020 53022 53023 53325 53398 28427 81539 81539 84166 141517 81539 53571 53498 53723 53742 59081 59085
all seem to have multiple nested SGROUPS (with SPL, parent list). We should maybe not calculate them until we can find a fix. Some of the structures may be incorrect as well.
using index
and PARENT
properties in SGROUPS may give enough infromation to calculate them:
for sg in Chem.GetMolSubstanceGroups(mol):
print(sg.GetPropsAsDict())
now fixed in the code
https://wwwdev.ebi.ac.uk/chebi/beta/CHEBI:53020
It seems to add the number 2 in front of the formula which needs fixing.
The Mass, InChI and InChIKey calculated are different to the ones generated by ACD which is unusual.