chemosim-lab / ProLIF

Interaction Fingerprints for protein-ligand complexes and more
https://prolif.readthedocs.io
Apache License 2.0
373 stars 71 forks source link

AtomValenceException when running prolif.Molecule.from_mda(u) #61

Closed noahharrison64 closed 2 years ago

noahharrison64 commented 2 years ago

Hi Cédric,

I've been encountering an issue when creating a prolif molecule from a PDB file. I run u = mda.Universe(filename) prolif.Molecule.from_mda(u) And the debug shows the error arises when using the RDKitConverter from MDAnalysis. I noticed you contributed to this module so thought it'd be fine posting here.


File ~/anaconda3/envs/opencadd/lib/python3.10/site-packages/prolif/molecule.py:118, in Molecule.from_mda(cls, obj, selection, **kwargs)

--> 118 mol = ag.convert_to.rdkit(**kwargs)

File ~/anaconda3/envs/opencadd/lib/python3.10/site-packages/MDAnalysis/converters/RDKit.py:339, in RDKitConverter.convert(self, obj, cache, NoImplicit, max_iter, force)

--> 339     mol = atomgroup_to_mol(ag, **kwargs)

File ~/anaconda3/envs/opencadd/lib/python3.10/site-packages/MDAnalysis/converters/RDKit.py:469, in atomgroup_to_mol(ag, NoImplicit, max_iter, force)

--> 469     mol = _standardize_patterns(mol, max_iter)

File ~/anaconda3/envs/opencadd/lib/python3.10/site-packages/MDAnalysis/converters/RDKit.py:754, in _standardize_patterns(mol, max_iter)
--> 754 for reactant in Chem.GetMolFrags(mol, asMols=True):

**AtomValenceException: Explicit valence for atom # 337 H, 2, is greater than permitted**

I see no issues with this atom or its valences. This problem has arisen multiple times with multiple different PDBs, and I'm sure the atoms are fine. For example, atom 337 belongs to Ile1071. Here is the atom info from the PDB file.

ATOM    325  N   ILE A1071      32.361  12.513  46.645  1.00  0.00           N
ATOM    326  CA  ILE A1071      32.941  11.350  47.305  1.00  0.00           C
ATOM    327  C   ILE A1071      33.166  10.245  46.277  1.00  0.00           C
ATOM    328  O   ILE A1071      32.234   9.843  45.570  1.00  0.00           O
ATOM    329  CB  ILE A1071      32.043  10.844  48.457  1.00  0.00           C
ATOM    330  CG1 ILE A1071      32.033  11.865  49.603  1.00  0.00           C
ATOM    331  CG2 ILE A1071      32.508   9.468  48.955  1.00  0.00           C
ATOM    332  CD1 ILE A1071      30.916  11.667  50.605  1.00  0.00           C
ATOM    333  H   ILE A1071      31.386  12.504  46.425  1.00  0.00           H
ATOM    334  HA  ILE A1071      33.827  11.612  47.687  1.00  0.00           H
ATOM    335  HB  ILE A1071      31.110  10.753  48.110  1.00  0.00           H
ATOM    336 HG12 ILE A1071      31.940  12.779  49.207  1.00  0.00           H
ATOM    337 HG13 ILE A1071      32.905  11.799  50.089  1.00  0.00           H
ATOM    338 HG21 ILE A1071      31.912   9.164  49.698  1.00  0.00           H
ATOM    339 HG22 ILE A1071      32.466   8.810  48.203  1.00  0.00           H
ATOM    340 HG23 ILE A1071      33.449   9.534  49.288  1.00  0.00           H
ATOM    341 HD11 ILE A1071      30.978  12.367  51.316  1.00  0.00           H
ATOM    342 HD12 ILE A1071      30.034  11.741  50.140  1.00  0.00           H
ATOM    343 HD13 ILE A1071      30.998  10.761  51.021  1.00  0.00           H

Furthermore, looking at this residue in 3D doesn't identify any errors. image

Do you have any suggestions as to what might be causing this issue and how it might be fixed?

Many thanks, Noah

cbouy commented 2 years ago

Hi Noah,

This is due to the code that guesses bonds from coordinates in MDAnalysis. You can see which atoms are causing the problem with the following snippet:

u = mda.Universe("prot.pdb", guess_bonds=True)
for atom in u.atoms:
    # find hydrogens with more than one bond
    if atom.element == "H" and len(atom.bonds) > 1:
        neighbours = set([i for i in atom.bonds.indices.flatten() if i != atom.index])
        for i in neighbours:
            neighbour = u.atoms[i]
            # find neighbour to H from different residue
            if neighbour.resid != atom.resid:
                print(f"{atom.resname}{atom.resid}:{atom.name}",
                      f"{neighbour.resname}{neighbour.resid}:{neighbour.name}")

Then visualize the problematic residues with PyMOL to make sure there's no actual clash.

If there's none you'll have to modify the van der Waals radii used by MDAnalysis to determine which atoms are connected by covalent bonds:

# radii used by MDAnalysis
print(mda.topology.guessers.tables.vdwradii)
# modify the radii for problematic atom types
u = mda.Universe("prot.pdb", guess_bonds=True, vdwradii={"H": 0.95, "O": ...})

It's a bit of a tedious process but unless you can directly have bonds in your PDB file, that's the only way I can think of.

Good luck! Cedric

noahharrison64 commented 2 years ago

Hi Cédric,

Thanks for pointing this out. The error now makes more sense and has provided some means to fixing this issue.

I'm using prolif for docking solutions, so as I'm only interested in static structures of the ligand in the binding site I might restrict my atom selection to just binding site residues before creating a prolif mol. Hopefully this will reduce the impact of the issue.

Thanks, Noah