forlilab / Meeko

Interfacing RDKit and AutoDock
GNU Lesser General Public License v2.1
171 stars 41 forks source link

mk_export.py Runtime error #62

Closed eightmm closed 10 months ago

eightmm commented 11 months ago

Hi! I tried to convert dlg, the result file of AutoDock-GPU, to sdf by using mk_export.py. At this time, the following error was output.

스크린샷, 2023-08-08 14-34-49

Looking at DLG and PDBQT, I found that there are 4 hydrogen atoms. 스크린샷, 2023-08-08 14-36-07

Is this problem caused by rdkit not recognizing hydrogen properly? or is there another reason?

Thanks!

diogomart commented 11 months ago

Hi! The explicit H in the REMARK SMILES line are unusual, my first guess is that the error is relates to that. Could you post the input ligand pdbqt as text so i can take a look tomorrow?

eightmm commented 11 months ago

@diogomart

스크린샷, 2023-08-08 15-27-06

REMARK SMILES [H]c1c(N([H])[H])c([H])c2nc3c([H])c(N([H])[H])c([H])c([H])c3c([H])c2c1[H] REMARK SMILES IDX 26 1 2 2 3 3 7 4 12 5 14 6 18 7 20 8 23 9 10 10 9 11 22 12 REMARK SMILES IDX 25 13 11 14 4 15 5 16 6 17 15 18 16 19 17 20 REMARK H PARENT REMARK Flexibility Score: inf ROOT ATOM 1 C UNL 1 8.982 23.182 49.516 1.00 0.00 0.012 A ATOM 2 C UNL 1 9.670 23.649 48.475 1.00 0.00 0.026 A ATOM 3 C UNL 1 10.665 22.851 47.794 1.00 0.00 0.034 A ATOM 4 C UNL 1 10.902 21.573 48.264 1.00 0.00 0.054 A ATOM 5 C UNL 1 9.939 17.864 51.143 1.00 0.00 0.054 A ATOM 6 C UNL 1 9.231 17.253 52.163 1.00 0.00 0.034 A ATOM 7 C UNL 1 8.222 18.030 52.852 1.00 0.00 0.026 A ATOM 8 C UNL 1 7.968 19.299 52.528 1.00 0.00 0.012 A ATOM 9 C UNL 1 8.475 21.272 51.058 1.00 0.00 0.020 A ATOM 10 N UNL 1 10.385 19.776 49.768 1.00 0.00 -0.248 NA ATOM 11 C UNL 1 10.196 21.070 49.336 1.00 0.00 0.073 A ATOM 12 C UNL 1 8.690 19.965 51.464 1.00 0.00 0.001 A ATOM 13 C UNL 1 9.190 21.846 50.020 1.00 0.00 0.001 A ATOM 14 C UNL 1 9.691 19.173 50.793 1.00 0.00 0.073 A ENDROOT BRANCH 3 15 ATOM 15 N UNL 1 11.274 23.326 46.684 1.00 0.00 -0.399 N ATOM 16 H UNL 1 11.077 24.283 46.358 1.00 0.00 0.156 HD ATOM 17 H UNL 1 11.935 22.731 46.164 1.00 0.00 0.156 HD ENDBRANCH 3 15 BRANCH 6 18 ATOM 18 N UNL 1 9.453 15.962 52.502 1.00 0.00 -0.399 N ATOM 19 H UNL 1 10.173 15.416 52.007 1.00 0.00 0.156 HD ATOM 20 H UNL 1 8.902 15.525 53.255 1.00 0.00 0.156 HD ENDBRANCH 6 18 TORSDOF 2

This pdbqt file is the input pdbqt.

스크린샷, 2023-08-08 15-27-51 The above pdbqt file uses "mk_prepare_ligand.py" to change this sdf file.

Thanks!

diogomart commented 11 months ago

Hi again, Could you post the SDF file as a block of code? It needs to be formatted as code so the spaces are preserved. To format text as code use triple backticks before and after the code block. https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-code

eightmm commented 11 months ago
1bcu_ligand

Created by X-TOOL on Fri Sep 26 17:34:14 2014
 27 29  0  0  0  0  0  0  0  0999 V2000
    8.9820   23.1820   49.5160  C 0  0  0  2  0  3
    9.6700   23.6490   48.4750  C 0  0  0  2  0  3
   10.6650   22.8510   47.7940  C 0  0  0  1  0  3
   10.9020   21.5730   48.2640  C 0  0  0  2  0  3
    9.9390   17.8640   51.1430  C 0  0  0  2  0  3
    9.2310   17.2530   52.1630  C 0  0  0  1  0  3
    8.2220   18.0300   52.8520  C 0  0  0  2  0  3
    7.9680   19.2990   52.5280  C 0  0  0  2  0  3
    8.4750   21.2720   51.0580  C 0  0  0  2  0  3
   10.3850   19.7760   49.7680  N 0  0  0  1  0  2
   10.1960   21.0700   49.3360  C 0  0  0  1  0  3
    8.6900   19.9650   51.4640  C 0  0  0  1  0  3
    9.1900   21.8460   50.0200  C 0  0  0  1  0  3
    9.6910   19.1730   50.7930  C 0  0  0  1  0  3
   11.2740   23.3260   46.6840  N 0  0  0  3  0  3
    9.4530   15.9620   52.5020  N 0  0  0  3  0  3
    8.2489   23.8205   49.9955  H 0  0  0  1  0  1
    9.4780   24.6579   48.1280  H 0  0  0  1  0  1
   11.6536   20.9575   47.7829  H 0  0  0  1  0  1
   10.7001   17.3042   50.6115  H 0  0  0  1  0  1
    7.6572   17.5677   53.6535  H 0  0  0  1  0  1
    7.2050   19.8431   53.0728  H 0  0  0  1  0  1
    7.7239   21.8637   51.5686  H 0  0  0  1  0  1
   11.0771   24.2830   46.3581  H 0  0  0  1  0  1
   11.9348   22.7308   46.1644  H 0  0  0  1  0  1
   10.1726   15.4160   52.0071  H 0  0  0  1  0  1
    8.9024   15.5253   53.2550  H 0  0  0  1  0  1
  1 13  4  0  0  1
  1  2  4  0  0  1
  2  3  4  0  0  1
  3 15  1  0  0  2
  3  4  4  0  0  1
  4 11  4  0  0  1
 11 10  4  0  0  1
 11 13  4  0  0  1
 13  9  4  0  0  1
  9 12  4  0  0  1
 12  8  4  0  0  1
 12 14  4  0  0  1
 14  5  4  0  0  1
 14 10  4  0  0  1
  5  6  4  0  0  1
  6 16  1  0  0  2
  6  7  4  0  0  1
  7  8  4  0  0  1
  1 17  1  0  0  2
  2 18  1  0  0  2
  4 19  1  0  0  2
  5 20  1  0  0  2
  7 21  1  0  0  2
  8 22  1  0  0  2
  9 23  1  0  0  2
 15 24  1  0  0  2
 15 25  1  0  0  2
 16 26  1  0  0  2
 16 27  1  0  0  2
M  END
> <MOLECULAR_FORMULA>
C13H11N3

> <MOLECULAR_WEIGHT>
209.2

> <NUM_HB_ATOMS>
3

> <NUM_ROTOR>
0

> <XLOGP2>
1.99

$$$$

Here!

diogomart commented 11 months ago

It looks like the hydrogen count field of the MOL block is used (page 13 in the specification), and RDKit does not remove hydrogens. I don't know if this is the expected RDKit behavior, but meeko should be checking that those hydrogens were removed, so I think it needs fixing on our end.

diogomart commented 10 months ago

So, it turns out that the input SDF has the HCount field set for the hydrogens, for example for the last atom:

    8.9024   15.5253   53.2550  H 0  0  0  1  0  1
                                           ^
                                           |
                                           hcount

According to the specification hcount = 1 sets the number of implicit Hs to zero. But because the HCount field is specified (i.e., has a non-zero value), RDKit adds a query and does not remove the atom by default.

Based on the header it looks like the SD file was written by X-TOOL. Do you know why the hcount fields were set?

eightmm commented 10 months ago

Sorry, I don't know . I just tested it with the files in the PDBbind dataset.

diogomart commented 10 months ago

Thanks! The easy fix on meeko's end is to tell RDKit to remove Hs even if they have hcount queries, but I was concerned that those queries were added on purpose. Glad to hear that they weren't :-)

diogomart commented 10 months ago

fixed in ba15b7dc41684405eeebd4b66d6e4a655a9178f2 Thanks for reporting this!