Error voxelizing protein with 'GLY'

zarif101 commented 3 years ago

I'm trying to voxelize proteins from the PDBBIND dataset. Everything works successfully for several, but for some .pdb files, I get the following error:

Found atoms with resnames ['GLY'] in the Molecule which can cause issues with the voxelization. Please make sure to only pass protein atoms and metals.

I was wondering why GLY within .pdb files does not work for voxelization, given that it's a valid residue. Also, how can I clean or fix a Molecule object that contains GLY, so I can voxelize it? I'm happy to provide more information or code. Thanks!

stefdoerr commented 3 years ago

Hi, did you follow the voxelization tutorial? https://software.acellera.com/docs/latest/moleculekit/tutorials/voxelization_tutorial.html If you don't use prepareProteinForAtomtyping you can end up with atoms which are not recognized as protein despite them belonging to a standard aminoacid and then you get that error.

zarif101 commented 3 years ago

Yes, my code is actually based on the tutorial. It seems like the prepareProteinForAtomtyping is itself the line that is throwing the error. Here is a picture of my code snippet showing the error. error

Also, here is a link to a PDB file that is bringing the error. It is the same one that is used in the above image: https://drive.google.com/file/d/1ZCDrlc6_ep75zJpXrktKo9zONmgLj4bk/view?usp=sharing

Could this be a bug with the prepareProteinForAtomtyping function?

stefdoerr commented 3 years ago

There is an issue in that PDB. There is a GLY 420 nitrogen just floating in space next to ILE 419. It's missing any other atoms of that residue.

If you don't care about that residue I would suggest just removing it from the molecule with: mol.remove("resname GLY and resid 420")

zarif101 commented 3 years ago

Thank you very much, there is now no error for that file. Is there a way to automate the process of removing free floating GLY from all of the PDBs with the error, or would I have to manually determine where in the protein there is an issue for every file?

stefdoerr commented 3 years ago

You can use something like this which will only keep the protein atoms, the water and the "metals" (which includes some ions).

from moleculekit.tools.atomtyper import metal_atypes
mol.filter("protein or water or element {}".format(" ".join(metal_atypes)))

cuijinli commented 1 year ago

You can use something like this which will only keep the protein atoms, the water and the "metals" (which includes some ions).
from moleculekit.tools.atomtyper import metal_atypes
mol.filter("protein or water or element {}".format(" ".join(metal_atypes)))

This is my test and I also have some problem.

`from moleculekit.molecule import Molecule from moleculekit.tools.voxeldescriptors import getVoxelDescriptors from moleculekit.tools.atomtyper import prepareProteinForAtomtyping

mol = Molecule('./3v94.pdb') mol.filter('protein') mol = prepareProteinForAtomtyping(mol) features, center, N = getVoxelDescriptors(mol, buffer=8) print(features)`

I download a protein from PDB, and use it in the moleculekit. But the error appeared with

*** Open Babel Warning in PerceiveBondOrders Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders (title is /tmp/tmpw26el0s5.pdb) [[0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]]

Can you help me to solve this?

stefdoerr commented 1 year ago

You can ignore that warning from open babel. The features array has values in it:

In [5]: features.sum()
Out[5]: 620407.9102250071

Acellera / moleculekit

Error voxelizing protein with 'GLY' #73