MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.32k stars 652 forks source link

Recommended practice of finding bonded hydrogen from hydrogen bond donor #2521

Open xiki-tempula opened 4 years ago

xiki-tempula commented 4 years ago

Currently, in the water bridge analysis module, the user needs to give the name for the hydrogen bond donor heavy atom and the hydrogens were found via a simple distance search. I'm interested in using the bonded information if possible for this step. However, I'm a man of gro and xtc and I'm not quite sure of how much information a topology file can provide.

I have checked @p-j-smith 's implementation and found that it is done by finding the bonded atom associated with the hydrogens. However, in water bridge analysis, the name of hydrogen bond donor heavy atom is provided instead of the hydrogen bond donor hydrogens. Thus, I need to find the hydrogen atoms associated with the atom and exclude the non-hydrogen atoms.

I can think of two implementations.

hydrogens = [atom for atom in heavy_atom.bonded_atoms if atom.name[0] == 'H']

or

hydrogens = [atom for atom in heavy_atom.bonded_atoms if atom.type == 'H']

Technically, the two implementations should be the same. Since I'm not very familiar with the universe where topology information is available and the type of files where the user might supply, I'm wondering if one implementation might be more robust and less error-prone than the other?

This issue is related to #2120.

lilyminium commented 4 years ago

Checking names is probably better.

>>> import MDAnalysis as mda
>>> from MDAnalysis.tests.datafiles import *
>>> u = mda.Universe(TPR)
>>> u.atoms.types
array(['opls_287', 'opls_290', 'opls_290', ..., 'opls_407', 'opls_407',
       'opls_407'], dtype=object)
>>> u.atoms.names
array(['N', 'H1', 'H2', ..., 'NA', 'NA', 'NA'], dtype=object)

You could also check the mass and charge like HydrogenBondAnalysis does (#2516). That is more agnostic than checking names/types if your format has that info available.

xiki-tempula commented 4 years ago

@lilyminium Thank you very much for the advice. I will use name instead. I mainly use charmm and amber so type usually works for me.

I'm not too worried about the mass and charge as this module operates in a fashion consistent with the old hydrogen bond analysis, which only checks the name. Thus, unless the user supplies the name for the hydrogen bond donor, in which case the user should know what they are doing, the default hydrogen bond donor atom name is consistent with the pdb amino acid name convention.

I will try to see if I can import some function from the new HydrogenBondAnalysis to do the job

richardjgowers commented 4 years ago

Just to throw a spanner in the works, I'd check via mass, then we offload the problem of identifying hydrogens to guess_masses :)

orbeckst commented 4 years ago

Well, if we correctly identified elements then I would use these.

xiki-tempula commented 4 years ago

I will probably work on this a bit later, once we have the element attribute in place. In PR #2519, the hydrogen bond donor lookup is done through a dictionary looking after the first frame. Thus, this new implementation should offer at most a 10s gain for a 100k atom system.