Closed xavgit closed 10 months ago
Hi,
I found the atom type assignments are incorrect for some atoms in the MOL2 file:
The atom types for most carbon atoms in ZINC000001644610 should be C.ar, representing aromatic carbons. The compound is correctly aromatized in the generated pdbqt file, probably because the bond section specifies the aromaticity of the bonds.
The atom types for the phenolate oxygens should be O.3. Typing them "O.2" is likely the cause of the total charge = 0 in this case. I suspect similar issues with the carboxylate containing compounds you posted on the autodock mailing list.
Attached is a MOL2 file for this compound directly downloaded from: https://zinc20.docking.org/substances/ZINC000001644610/
For this MOL2 file, meeko (RDKit) will report a total charge = -2.
Hope this is mildly helpful..!
Here are the relevant bits of my comments from the mailing list:
Reading with RDKit: charge = 0 (as you reported). Reading with OpenBabel: charge = -2 Converting to MOL with OpenBabel and then reading with RDKit: charge = -2 Converting to MOL2 with OpenBabel (i.e. re-writing it) and reading with RDKit: error because non-ring atom is marked aromatic.
I don't really know what to do from here. An option that comes to mind is to download 2D SDFs from ZINC and use external software to calculate the 3D coordinates and protonate, such as molconvert from ChemAxon.
Hi, thanks for the suggestions. I cannot find the 2D SDFs in the download section of the tranches. Can you kindly indicate where they are?
Thanks.
Saverio
Hi @xavgit,
2D SDFs can be downloaded individually from Zinc. According to the head of the files, seems like those were made in RDKit. Using RDKit functions in Python, you can create molecule from smi (which you can get from tranches) and then compute 2D coordinates and write an SDF file as output. But depending on what external software you wish to use in the next steps, it might be ok to just use the Smiles strings as the inputs.
Here's a few interesting discussions regarding the ambiguity of the MOL2 format and the fact that RDKit expects atom types as written by corina.
https://sourceforge.net/p/rdkit/mailman/message/37668451/ https://github.com/rdkit/rdkit/discussions/4061 https://sourceforge.net/p/rdkit/mailman/message/37374678/
Thanks for the links.
Saverio
Hi, I have downloaded some molecules from ZINC20 in mol2 format. I then have used mk_prepare_ligand.py to convert the molecules to the pdbqt format. For an high percentage of them mk_prepare_ligand.py takes in input a molecule with total charge of -2 ( I have checked with openbabel GetTotalCharge() ) and return a pdbqt converted ligand with a TOT CHARGE of +/.-0.00 as reported using the -v option.
Why these different values of total charge of the same molecule with different formats?
This different values of total charge of the ligands can have effects on the docking results?
For example for ZINC000001644610.mol2 openbabel evaluate a total charge equal -2 whereas mk_prepare_ligand.py returns the following output when the previous molecule is processed:
Processing ./ZINC000001644610.mol2 file Molecule setup
==============[ ATOMS ]=================================================== idx | coords | charge |ign| atype | connections -----+----------------------------+--------+---+----------+--------------- . . . 0 | 0.002 -0.004 0.002 | -0.258 | 0 | OA | [1] 1 | -0.014 1.214 0.009 | 0.062 | 0 | N | [0, 2, 3] 2 | 1.032 1.837 0.002 | -0.258 | 0 | OA | [1] 3 | -1.306 1.936 0.019 | 0.271 | 0 | A | [1, 29, 4] 4 | -1.322 3.320 0.021 | 0.020 | 0 | A | [3, 5, 30] 5 | -2.522 4.006 0.030 | -0.006 | 0 | A | [4, 6, 31] 6 | -3.719 3.311 0.038 | 0.204 | 0 | A | [5, 7, 8] 7 | -4.899 3.984 0.048 | -0.288 | 0 | OA | [6] 8 | -3.709 1.909 0.036 | 0.117 | 0 | A | [6, 9, 29] 9 | -4.885 1.212 0.043 | -0.252 | 0 | NA | [8, 10] 10 | -5.000 0.135 0.768 | 0.035 | 0 | C | [9, 11, 32] 11 | -6.266 -0.614 0.776 | -0.007 | 0 | A | [10, 28, 12] 12 | -7.345 -0.173 0.001 | -0.053 | 0 | A | [11, 13, 33] 13 | -8.520 -0.869 0.008 | -0.053 | 0 | A | [12, 14, 34] 14 | -8.645 -2.023 0.791 | -0.007 | 0 | A | [13, 15, 27] 15 | -9.911 -2.772 0.799 | 0.035 | 0 | C | [14, 16, 35] 16 | -10.028 -3.847 1.527 | -0.252 | 0 | NA | [15, 17] 17 | -11.204 -4.543 1.535 | 0.117 | 0 | A | [16, 25, 18] 18 | -11.919 -4.727 0.349 | 0.046 | 0 | A | [17, 19, 36] 19 | -13.105 -5.431 0.364 | 0.271 | 0 | A | [18, 20, 23] 20 | -13.862 -5.626 -0.893 | 0.062 | 0 | N | [19, 21, 22] 21 | -13.436 -5.167 -1.938 | -0.258 | 0 | OA | [20] 22 | -14.910 -6.247 -0.883 | -0.258 | 0 | OA | [20] 23 | -13.591 -5.955 1.550 | 0.020 | 0 | A | [19, 24, 37] 24 | -12.893 -5.778 2.730 | -0.006 | 0 | A | [23, 25, 38] 25 | -11.698 -5.080 2.732 | 0.204 | 0 | A | [17, 24, 26] 26 | -11.011 -4.907 3.892 | -0.288 | 0 | OA | [25] 27 | -7.566 -2.465 1.565 | -0.053 | 0 | A | [14, 28, 39] 28 | -6.389 -1.771 1.554 | -0.053 | 0 | A | [11, 27, 40] 29 | -2.490 1.228 0.021 | 0.046 | 0 | A | [3, 8, 41] 30 | -0.391 3.868 0.015 | 0.070 | 0 | H | [4] 31 | -2.526 5.086 0.032 | 0.067 | 0 | H | [5] 32 | -4.166 -0.207 1.363 | 0.085 | 0 | H | [10] 33 | -7.248 0.717 -0.603 | 0.063 | 0 | H | [12] 34 | -9.353 -0.529 -0.589 | 0.063 | 0 | H | [13] 35 | -10.744 -2.432 0.201 | 0.085 | 0 | H | [15] 36 | -11.543 -4.320 -0.577 | 0.072 | 0 | H | [18] 37 | -14.521 -6.504 1.552 | 0.070 | 0 | H | [23] 38 | -13.279 -6.190 3.651 | 0.067 | 0 | H | [24] 39 | -7.662 -3.355 2.168 | 0.063 | 0 | H | [27] 40 | -5.557 -2.112 2.152 | 0.063 | 0 | H | [28] 41 | -2.474 0.148 0.018 | 0.072 | 0 | H | [29] -----+----------------------------+--------+---+----------+--------------- . . . TOT CHARGE: 0.000
Thanks.
Saverio
PS: I have posted this problem on autodock@scripps.edu but I guess that this place is more appropriated an the question is made more clear. Sorry if this way is an error.
ZINC000001644610.pdbqt.txt ZINC000001644610.mol2.txt