chembl / ChEMBL_Structure_Pipeline

ChEMBL database structure pipelines
MIT License
186 stars 38 forks source link

standardiser changes the structure even with excluded structures #6

Closed eloyfelix closed 5 years ago

eloyfelix commented 5 years ago

It looks like RDKit is changing Bismuth valence for this structure.


molblock = """
  SciTegic03151315062D

 35 33  0  0  0  0            999 V2000
    5.1714   -6.6800    0.0000 C   0  0
    4.4533   -7.0848    0.0000 C   0  0
    5.8848   -7.0848    0.0000 C   0  0
    4.7570   -5.9667    0.0000 C   0  0
    6.5932   -6.6800    0.0000 C   0  0
    4.4678   -7.8945    0.0000 C   0  0
    5.1570   -5.2486    0.0000 O   0  0
    5.2003   -8.2849    0.0000 O   0  0
    6.5932   -5.8558    0.0000 O   0  0
    5.5763   -5.9667    0.0000 O   0  0
    3.9328   -5.9715    0.0000 O   0  5
    3.7738   -8.3283    0.0000 O   0  5
    7.3113   -7.0848    0.0000 O   0  5
    7.1571   -8.3283    0.0000 Bi  0  1
    2.0387   -7.2005    0.0000 N   0  3
    1.3591   -6.7812    0.0000 C   0  0
    1.3591   -6.0149    0.0000 C   0  0
   -4.2943   -6.0389    0.0000 C   0  0
   -3.6292   -5.5956    0.0000 O   0  0
   -2.9303   -6.0389    0.0000 C   0  0
   -4.0340   -6.8535    0.0000 C   0  0
   -3.1905   -6.8535    0.0000 C   0  0
    2.7472   -6.8197    0.0000 O   0  5
    2.0580   -7.9813    0.0000 O   0  0
   -5.0172   -5.5956    0.0000 C   0  0
    2.0725   -5.5956    0.0000 N   0  0
    0.6121   -5.5956    0.0000 N   0  0
   -5.7257   -6.0389    0.0000 N   0  0
   -1.5037   -6.0149    0.0000 S   0  0
   -2.2218   -5.5956    0.0000 C   0  0
   -0.0819   -6.0149    0.0000 C   0  0
   -0.8097   -5.5956    0.0000 C   0  0
   -5.7257   -6.8535    0.0000 C   0  0
   -6.4438   -5.5956    0.0000 C   0  0
    2.7665   -6.0149    0.0000 C   0  0
  6  2  1  0
  7  4  2  0
  8  6  2  0
  9  5  2  0
 10  1  1  0
 11  4  1  0
 12  6  1  0
 13  5  1  0
  2  1  1  0
  3  1  1  0
  4  1  1  0
  5  3  1  0
 16 15  1  0
 17 16  2  3
 18 19  1  0
 19 20  1  0
 20 30  1  0
 21 22  1  0
 22 20  2  0
 23 15  1  0
 24 15  2  0
 25 18  1  0
 26 17  1  0
 27 17  1  0
 28 25  1  0
 29 32  1  0
 30 29  1  0
 31 27  1  0
 32 31  1  0
 33 28  1  0
 34 28  1  0
 35 26  1  0
 21 18  2  0
M  CHG  6  11  -1  12  -1  13  -1  14   3  15   1  23  -1
M  END
"""

print(Chem.MolBlockToInchi(molblock))
[13:41:14] WARNING: Charges were rearranged; Omitted undefined stereo; Proton(s) added/removed
InChI=1S/C13H22N4O3S.C6H8O7.Bi/c1-14-13(9-17(18)19)15-6-7-21-10-12-5-4-11(20-12)8-16(2)3;7-3(8)1-6(13,5(11)12)2-4(9)10;/h4-5,9,14-15H,6-8,10H2,1-3H3;13H,1-2H2,(H,7,8)(H,9,10)(H,11,12);/q;;+3/p-3

m = Chem.MolFromMolBlock(molblock, sanitize=False, removeHs=False)
# put back wedge bonds info
for b in m.GetBonds():
    if b.HasProp("_MolFileBondStereo"):
        val = b.GetProp("_MolFileBondStereo")
        if val == '1':
            b.SetBondDir(Chem.BondDir.BEGINWEDGE)
        elif val == '6':
            b.SetBondDir(Chem.BondDir.BEGINDASH)

print(Chem.MolBlockToInchi(Chem.MolToMolBlock(m)))
[13:41:14] WARNING: Charges were rearranged; Metal was disconnected; Omitted undefined stereo; Proton(s) added/removed
'InChI=1S/C13H22N4O3S.C6H8O7.Bi.6H/c1-14-13(9-17(18)19)15-6-7-21-10-12-5-4-11(20-12)8-16(2)3;7-3(8)1-6(13,5(11)12)2-4(9)10;;;;;;;/h4-5,9,14-15H,6-8,10H2,1-3H3;13H,1-2H2,(H,7,8)(H,9,10)(H,11,12);;;;;;;/q;;+3;;;;;;/p-3'
greglandrum commented 5 years ago

Dumb question: Where are you seeing the valence change? The disconnection in the InChI is done by the InChI algorithm and can’t be switched off; do you mean something else?

eloyfelix commented 5 years ago

sorry, I wrongly pasted the output. I just updated the original comment.

greglandrum commented 5 years ago

Fixed with: 025aefee98ff9