Acellera / moleculekit

MoleculeKit: Your favorite molecule manipulation kit
Other
200 stars 37 forks source link

Protein atoms regarded as nonprotein atoms by atomselect() #107

Closed computbiolgeek closed 2 years ago

computbiolgeek commented 2 years ago

Dear MoleculeKit developer,

I'd like to report an odd issue I came across with MoleculeKit. I was rencetly using MoleculeKit to process some protein PDB files. For some reason, MoleculeKit thought that some protein atoms were non-protein atoms. The following is the code I ran with the attached PDB file (test_pdb.txt, renamed to test_pdb.txt because github didn't like a file with pdb suffix).

from moleculekit.molecule import Molecule
from moleculekit.tools.atomtyper import prepareProteinForAtomtyping
mol = Molecule('test.pdb')
atoms = mol.get('name')
protsel = mol.atomselect('protein')
atoms[~protsel]

which outputs

array(['CE1', 'NE2', 'HE1', 'HE2'], dtype=object)

Which buffled me because these atoms are protein atoms apparently. So, I then went on to run these two lines of code to find out what residue(s) these atoms belong.

resids = mol.get('resid')
resids[~protsel]

which outputs

array([515, 515, 515, 515])

Residue 515 happens to be a HIS. Can you please help look into what might have caused this? Thanks a lot!

stefdoerr commented 2 years ago

Hi, this unfortunately has to do with how VMD does atom selections. It guesses bonds based on atom distances and elements and anything that is not bonded to the protein is regarded as non-protein. In your case there might be some larger-than-normal distance in the atoms of that histidine, or maybe it's lacking backbone N, CA, C atoms (which also makes VMD guess stuff as non-protein). If you manage to build it with amber.build you should be able to do correct atom-selections using the prmtop since the bonds will be exact.

computbiolgeek commented 2 years ago

Hi Stefen, thanks for the quick reply. HIS515 in this structure is indeed corrupted. I should have looked at it first.