forlilab / Meeko

Interfacing RDKit and AutoDock
GNU Lesser General Public License v2.1
171 stars 41 forks source link

mk_prepare_ligand.py can change the total charge of the processed molecule. #63

Closed xavgit closed 10 months ago

xavgit commented 11 months ago

Hi, I have downloaded some molecules from ZINC20 in mol2 format. I then have used mk_prepare_ligand.py to convert the molecules to the pdbqt format. For an high percentage of them mk_prepare_ligand.py takes in input a molecule with total charge of -2 ( I have checked with openbabel GetTotalCharge() ) and return a pdbqt converted ligand with a TOT CHARGE of +/.-0.00 as reported using the -v option.

Why these different values of total charge of the same molecule with different formats?

This different values of total charge of the ligands can have effects on the docking results?

For example for ZINC000001644610.mol2 openbabel evaluate a total charge equal -2 whereas mk_prepare_ligand.py returns the following output when the previous molecule is processed:

Processing ./ZINC000001644610.mol2 file Molecule setup

==============[ ATOMS ]=================================================== idx | coords | charge |ign| atype | connections -----+----------------------------+--------+---+----------+--------------- . . . 0 | 0.002 -0.004 0.002 | -0.258 | 0 | OA | [1] 1 | -0.014 1.214 0.009 | 0.062 | 0 | N | [0, 2, 3] 2 | 1.032 1.837 0.002 | -0.258 | 0 | OA | [1] 3 | -1.306 1.936 0.019 | 0.271 | 0 | A | [1, 29, 4] 4 | -1.322 3.320 0.021 | 0.020 | 0 | A | [3, 5, 30] 5 | -2.522 4.006 0.030 | -0.006 | 0 | A | [4, 6, 31] 6 | -3.719 3.311 0.038 | 0.204 | 0 | A | [5, 7, 8] 7 | -4.899 3.984 0.048 | -0.288 | 0 | OA | [6] 8 | -3.709 1.909 0.036 | 0.117 | 0 | A | [6, 9, 29] 9 | -4.885 1.212 0.043 | -0.252 | 0 | NA | [8, 10] 10 | -5.000 0.135 0.768 | 0.035 | 0 | C | [9, 11, 32] 11 | -6.266 -0.614 0.776 | -0.007 | 0 | A | [10, 28, 12] 12 | -7.345 -0.173 0.001 | -0.053 | 0 | A | [11, 13, 33] 13 | -8.520 -0.869 0.008 | -0.053 | 0 | A | [12, 14, 34] 14 | -8.645 -2.023 0.791 | -0.007 | 0 | A | [13, 15, 27] 15 | -9.911 -2.772 0.799 | 0.035 | 0 | C | [14, 16, 35] 16 | -10.028 -3.847 1.527 | -0.252 | 0 | NA | [15, 17] 17 | -11.204 -4.543 1.535 | 0.117 | 0 | A | [16, 25, 18] 18 | -11.919 -4.727 0.349 | 0.046 | 0 | A | [17, 19, 36] 19 | -13.105 -5.431 0.364 | 0.271 | 0 | A | [18, 20, 23] 20 | -13.862 -5.626 -0.893 | 0.062 | 0 | N | [19, 21, 22] 21 | -13.436 -5.167 -1.938 | -0.258 | 0 | OA | [20] 22 | -14.910 -6.247 -0.883 | -0.258 | 0 | OA | [20] 23 | -13.591 -5.955 1.550 | 0.020 | 0 | A | [19, 24, 37] 24 | -12.893 -5.778 2.730 | -0.006 | 0 | A | [23, 25, 38] 25 | -11.698 -5.080 2.732 | 0.204 | 0 | A | [17, 24, 26] 26 | -11.011 -4.907 3.892 | -0.288 | 0 | OA | [25] 27 | -7.566 -2.465 1.565 | -0.053 | 0 | A | [14, 28, 39] 28 | -6.389 -1.771 1.554 | -0.053 | 0 | A | [11, 27, 40] 29 | -2.490 1.228 0.021 | 0.046 | 0 | A | [3, 8, 41] 30 | -0.391 3.868 0.015 | 0.070 | 0 | H | [4] 31 | -2.526 5.086 0.032 | 0.067 | 0 | H | [5] 32 | -4.166 -0.207 1.363 | 0.085 | 0 | H | [10] 33 | -7.248 0.717 -0.603 | 0.063 | 0 | H | [12] 34 | -9.353 -0.529 -0.589 | 0.063 | 0 | H | [13] 35 | -10.744 -2.432 0.201 | 0.085 | 0 | H | [15] 36 | -11.543 -4.320 -0.577 | 0.072 | 0 | H | [18] 37 | -14.521 -6.504 1.552 | 0.070 | 0 | H | [23] 38 | -13.279 -6.190 3.651 | 0.067 | 0 | H | [24] 39 | -7.662 -3.355 2.168 | 0.063 | 0 | H | [27] 40 | -5.557 -2.112 2.152 | 0.063 | 0 | H | [28] 41 | -2.474 0.148 0.018 | 0.072 | 0 | H | [29] -----+----------------------------+--------+---+----------+--------------- . . . TOT CHARGE: 0.000

Thanks.

Saverio

PS: I have posted this problem on autodock@scripps.edu but I guess that this place is more appropriated an the question is made more clear. Sorry if this way is an error.

ZINC000001644610.pdbqt.txt ZINC000001644610.mol2.txt

rwxayheee commented 11 months ago

Hi,

I found the atom type assignments are incorrect for some atoms in the MOL2 file:

The atom types for most carbon atoms in ZINC000001644610 should be C.ar, representing aromatic carbons. The compound is correctly aromatized in the generated pdbqt file, probably because the bond section specifies the aromaticity of the bonds.

The atom types for the phenolate oxygens should be O.3. Typing them "O.2" is likely the cause of the total charge = 0 in this case. I suspect similar issues with the carboxylate containing compounds you posted on the autodock mailing list.

Attached is a MOL2 file for this compound directly downloaded from: https://zinc20.docking.org/substances/ZINC000001644610/

527368906.mol2.txt

For this MOL2 file, meeko (RDKit) will report a total charge = -2.

Hope this is mildly helpful..!

diogomart commented 11 months ago

Here are the relevant bits of my comments from the mailing list:

For this MOL2 ``` @MOLECULE ZINC000034235761 47 48 0 0 0 SMALL USER_CHARGES @ATOM 1 C1 -0.2907 1.4244 0.5537 C.2 1 ZINC0000342357611 -0.2000 2 C2 -0.1576 0.1283 0.4179 C.2 1 ZINC0000342357611 -0.1100 3 C3 1.2174 -0.4846 0.3469 C.3 1 ZINC0000342357611 -0.0800 4 H4 1.9727 0.2974 0.4245 H 1 ZINC0000342357611 0.1200 5 C5 1.3822 -1.2330 -0.9828 C.3 1 ZINC0000342357611 0.2500 6 H6 1.3035 -0.5302 -1.8123 H 1 ZINC0000342357611 0.1100 7 O7 2.6567 -1.8784 -1.0176 O.3 1 ZINC0000342357611 -0.3400 8 C8 3.0025 -2.4135 -2.2967 C.3 1 ZINC0000342357611 0.2200 9 H9 2.1763 -3.0160 -2.6741 H 1 ZINC0000342357611 0.0600 10 O10 3.2657 -1.3429 -3.2058 O.3 1 ZINC0000342357611 -0.3700 11 C11 3.5980 -1.7691 -4.5287 C.3 1 ZINC0000342357611 0.1100 12 H12 2.7797 -2.3633 -4.9354 H 1 ZINC0000342357611 0.0800 13 C13 3.8296 -0.5446 -5.4163 C.3 1 ZINC0000342357611 0.0900 14 O14 2.6081 0.1848 -5.5499 O.3 1 ZINC0000342357611 -0.5600 15 C15 4.8721 -2.6168 -4.4853 C.3 1 ZINC0000342357611 0.1000 16 H16 5.1026 -2.9795 -5.4870 H 1 ZINC0000342357611 0.0700 17 O17 5.9579 -1.8226 -4.0034 O.3 1 ZINC0000342357611 -0.5300 18 C18 4.6518 -3.8079 -3.5477 C.3 1 ZINC0000342357611 0.0800 19 H19 3.8568 -4.4406 -3.9425 H 1 ZINC0000342357611 0.0800 20 O20 5.8592 -4.5654 -3.4454 O.3 1 ZINC0000342357611 -0.5500 21 C21 4.2525 -3.2873 -2.1635 C.3 1 ZINC0000342357611 0.0700 22 H22 4.0400 -4.1295 -1.5050 H 1 ZINC0000342357611 0.0700 23 O23 5.3220 -2.5124 -1.6179 O.3 1 ZINC0000342357611 -0.5300 24 O24 0.3260 -2.2268 -1.0809 O.2 1 ZINC0000342357611 -0.3500 25 C25 0.1105 -3.0092 -0.0123 C.2 1 ZINC0000342357611 0.0300 26 C26 0.5736 -2.7193 1.2097 C.2 1 ZINC0000342357611 -0.2400 27 C27 0.2814 -3.5985 2.2741 C.3 1 ZINC0000342357611 0.5200 28 O28 -0.4838 -4.5394 2.0991 O.2 1 ZINC0000342357611 -0.7200 29 O29 0.7928 -3.4210 3.3734 O.2 1 ZINC0000342357611 -0.6800 30 C30 1.3947 -1.4893 1.4899 C.3 1 ZINC0000342357611 0.0200 31 H31 2.4461 -1.7653 1.5712 H 1 ZINC0000342357611 0.0700 32 C32 0.9312 -0.8552 2.8029 C.3 1 ZINC0000342357611 -0.1900 33 C33 1.8338 0.3010 3.1485 C.3 1 ZINC0000342357611 0.5000 34 O34 2.7237 0.6299 2.3826 O.2 1 ZINC0000342357611 -0.7100 35 O35 1.6743 0.9076 4.1940 O.2 1 ZINC0000342357611 -0.7200 36 H36 0.5848 2.0537 0.6148 H 1 ZINC0000342357611 0.1000 37 H37 -1.2761 1.8641 0.6001 H 1 ZINC0000342357611 0.0900 38 H38 -1.0331 -0.5011 0.3569 H 1 ZINC0000342357611 0.1000 39 H39 4.5869 0.0949 -4.9627 H 1 ZINC0000342357611 0.0700 40 H40 4.1691 -0.8680 -6.4003 H 1 ZINC0000342357611 0.0600 41 H41 2.6807 0.9761 -6.1010 H 1 ZINC0000342357611 0.3800 42 H42 6.7988 -2.2970 -3.9498 H 1 ZINC0000342357611 0.3800 43 H43 5.7918 -5.3366 -2.8659 H 1 ZINC0000342357611 0.3900 44 H44 5.1376 -2.1505 -0.7404 H 1 ZINC0000342357611 0.3900 45 H45 -0.4635 -3.9145 -0.1442 H 1 ZINC0000342357611 0.1600 46 H46 -0.0921 -0.4967 2.6919 H 1 ZINC0000342357611 0.0400 47 H47 0.9715 -1.5983 3.5993 H 1 ZINC0000342357611 0.1100 @UNITY_ATOM_ATTR 28 1 charge -1 34 1 charge -1 @BOND 1 1 2 2 2 1 36 1 3 1 37 1 4 2 3 1 5 2 38 1 6 3 4 1 7 3 30 1 8 3 5 1 9 5 6 1 10 5 7 1 11 5 24 1 12 7 8 1 13 8 9 1 14 8 21 1 15 8 10 1 16 10 11 1 17 11 12 1 18 11 13 1 19 11 15 1 20 13 14 1 21 13 39 1 22 13 40 1 23 14 41 1 24 15 16 1 25 15 17 1 26 15 18 1 27 17 42 1 28 18 19 1 29 18 20 1 30 18 21 1 31 20 43 1 32 21 22 1 33 21 23 1 34 23 44 1 35 24 25 1 36 25 26 2 37 25 45 1 38 26 27 1 39 26 30 1 40 27 28 1 41 27 29 2 42 30 31 1 43 30 32 1 44 32 33 1 45 32 46 1 46 32 47 1 47 33 34 1 48 33 35 2 ```

Reading with RDKit: charge = 0 (as you reported). Reading with OpenBabel: charge = -2 Converting to MOL with OpenBabel and then reading with RDKit: charge = -2 Converting to MOL2 with OpenBabel (i.e. re-writing it) and reading with RDKit: error because non-ring atom is marked aromatic.

I don't really know what to do from here. An option that comes to mind is to download 2D SDFs from ZINC and use external software to calculate the 3D coordinates and protonate, such as molconvert from ChemAxon.

xavgit commented 11 months ago

Hi, thanks for the suggestions. I cannot find the 2D SDFs in the download section of the tranches. Can you kindly indicate where they are?

Thanks.

Saverio

rwxayheee commented 11 months ago

Hi @xavgit,

2D SDFs can be downloaded individually from Zinc. According to the head of the files, seems like those were made in RDKit. Using RDKit functions in Python, you can create molecule from smi (which you can get from tranches) and then compute 2D coordinates and write an SDF file as output. But depending on what external software you wish to use in the next steps, it might be ok to just use the Smiles strings as the inputs.

diogomart commented 10 months ago

Here's a few interesting discussions regarding the ambiguity of the MOL2 format and the fact that RDKit expects atom types as written by corina.

https://sourceforge.net/p/rdkit/mailman/message/37668451/ https://github.com/rdkit/rdkit/discussions/4061 https://sourceforge.net/p/rdkit/mailman/message/37374678/

xavgit commented 10 months ago

Thanks for the links.

Saverio