forlilab / Meeko

Interfacing RDKit and AutoDock
GNU Lesser General Public License v2.1
169 stars 41 forks source link

NaNs from meeko.MoleculePreparation for HEM downloaded from PDBeChem #105

Closed hrp1000 closed 1 month ago

hrp1000 commented 2 months ago

Hi

I downloaded an SDF file for HEM from PDBeChem, and want to convert it to PDBQT, so I followed the guide for the API here.

wget https://www.ebi.ac.uk/pdbe/static/files/pdbechem_v2/HEM_model.sdf

Ran this script:

!/usr/bin/env python3

from meeko import MoleculePreparation, PDBQTWriterLegacy, LinkedRDKitChorizo from rdkit import Chem with open("HEM_model.sdf", "r") as filein: hem = filein.read() mol = Chem.MolFromMolBlock(hem,removeHs=False) preparator = MoleculePreparation(merge_these_atom_types=[]) prepared_mol = preparator.prepare(mol) prepared_mol[0].show() ligand_coordinates_pdbqt = PDBQTWriterLegacy.write_string(prepared_mol[0])

The output of the "prepared_mol[0].show()" line starts -

[16:19:30] Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D. Molecule setup

==============[ ATOMS ]=================================================== idx | coords | charge |ign| atype | connections -----+----------------------------+--------+---+----------+--------------- . . . 0 | 2.748 -19.531 39.896 | nan | 0 | C | [4, 31, 74] 1 | 3.258 -17.744 35.477 | nan | 0 | C | [7, 14, 43] 2 | 1.703 -21.900 33.637 | nan | 0 | C | [17, 21, 44] 3 | 1.149 -23.677 38.059 | nan | 0 | C | [24, 28, 45] 4 | 3.031 -18.673 38.872 | nan | 0 | A | [0, 5, 38]

Note the nans for the charge. The string in tuple ligand_coordinates_pdbqt[0] is empty, and that in ligand_coordinates_pdbqt[2] has "atom number 0 has non finite charge, mol name: HEM - Model conformer, charge: nan\natom number 1 has non finite charge, mol name: HEM..." ad nauseam.

So I guess I'm doing something wrong - but what? If I download "FAD_model.sdf" from the same site and run the same code (with "HEM_model.sdf" replaced by "FAD_model.sdf") the charges look okay and I get a valid PDBQT with the same script.

diogomart commented 2 months ago

Hi, Meeko uses RDKit to compute Gasteiger charges, and there are no Gasteiger parameters for Iron. Not entirely sure how to fix this. We'll try to figure something out and post back here if we find a solution.

hrp1000 commented 2 months ago

Ah, okay. So if I removed the iron then presumably the charges wouldn't be NaNs? I could always put it back later with a charge I obtain from somewhere else...

diogomart commented 2 months ago

Yes, that would work. The workaround for meeko would be exactly that: remove iron, calculate charges, add iron back with user defined charge or a reasonable default (+2 for iron in heme I think).

hrp1000 commented 1 month ago

Hi Diogo

I've had a look at this now, and can manipulate an input PDB file (from https://www.ebi.ac.uk/pdbe/static/files/pdbechem_v2/HEM_model.pdb, much easier for me than trying to manipulate SDF files) to remove the Fe and get the rdkit stuff to run and produce a PDBQT file (although I do get errors about implicit hydrogens). I'll shortly be able to put it back again into a PDBQT file for Autodoc Vina.

Now the question arises - how do I get a list of the atoms that do not have Gasteiger charges available (or alternatively, a list of the atoms with appropriate valency/hybridisation/whatever that do have the Gasteiger charges available?). So, for example, if I have chlorophyll A, that has a corrin ring with bound Mg, but there are, of course, many other possible metals in ligands bound by proteins.

rwxayheee commented 1 month ago

Hi @hrp1000 You could take a peek at what atom types RDKit supports for Gasteiger charges calculation from here

/*! \brief Gasteiger partial charge parameters
 */
std::string defaultParamData =
    "H       *      7.17    6.24    -0.56 \n \
C       sp3     7.98    9.18    1.88 \n \
C       sp2     8.79    9.32    1.51 \n \
C       sp      10.39   9.45    0.73 \n \
N       sp3     11.54   10.82   1.36 \n \
N       sp2     12.87   11.15   0.85 \n \
N       sp      15.68   11.7    -0.27 \n \
O       sp3     14.18   12.92   1.39 \n \
O       sp2     17.07   13.79   0.47 \n \
F       sp3     14.66   13.85   2.31 \n \
Cl      sp3     11.00   9.69    1.35 \n \
Br      sp3     10.08   8.47    1.16 \n \
I       sp3     9.9     7.96    0.96 \n \
S       sp3     10.14   9.13    1.38 \n \
S       so      10.14   9.13    1.38 \n \
S       so2     12.00   10.81   1.20 \n \
S       sp2     10.88   9.49    1.33 \n \
P       sp3     8.90    8.24    0.96 \n \
X       *       0.00    0.00    0.00 \n \
";

/*! \brief additional Gasteiger partial charge parameters
  generated using the calculation in PyBabel:
  http://mgltools.scripps.edu/api/PyBabel/PyBabel.gasteiger-pysrc.html

 */
std::string additionalParamData =
    "P       sp2     9.665   8.530   0.735 \n \
Si      sp3     7.300   6.567   0.657 \n \
Si      sp2     7.905   6.748   0.443 \n \
Si      sp      9.065   7.027  -0.002 \n \
B       sp3     5.980   6.820   1.605 \n \
B       sp2     6.420   6.807   1.322 \n \
Be      sp3     3.845   6.755   3.165 \n \
Be      sp2     4.005   6.725   3.035 \n \
Mg      sp2     3.565   5.572   2.197 \n \
Mg      sp3     3.300   5.587   2.447 \n \
Mg      sp      4.040   5.472   1.823 \n \
Al      sp3     5.375   4.953   0.867 \n \
Al      sp2     5.795   5.020   0.695 \n \
";
hrp1000 commented 1 month ago

Hi Amy -

This is exactly what I need - many thanks!