Hydrogen positions converting PDB to XYZ

TinkerTools / tinker

Tinker: Software Tools for Molecular Design

https://dasher.wustl.edu/tinker/

Other

129 stars 61 forks source link

Hydrogen positions converting PDB to XYZ #95

Closed peastman closed 3 years ago

peastman commented 3 years ago

I have a PDB file I need to convert to XYZ so I can use it with Tinker. (I'm working on amoebabio18 support in OpenMM, so I need to be able to compute forces and energies with both programs to compare.) When I run pdbxyz on it, I find that all the heavy atoms get translated correctly, but all the hydrogens end up with entirely new positions. Is there some way I can get it to translate the file accurately?

jayponder commented 3 years ago

Hi Peter, Tinker will use hydrogen atoms directly from the PDB file when possible, otherwise it will build the hydrogens in "ideal" positions. If your hydrogens have new positions, that tells me that Tinker did not understand your naming convention for the hydrogens. We try to handle various of the common hydrogen naming conventions, including the proper PDB naming convention that many programs ignore :) If you can post or email the PDB file, I'll take a look.

peastman commented 3 years ago

Thanks! The file is attached. I'm not sure where it originally came from, but you're right that the hydrogen names don't look standard. Let me fix them and see if that solves the problem.

alanine-dipeptide-implicit.pdb.txt

peastman commented 3 years ago

That fixed it. Thanks!

jayponder commented 3 years ago

Yeah, it looks like it was having trouble with the names of the hydrogens on the ACE and NME capping groups. The hydrogens on the ALA residue were fine. I'll add that capping group hydrogen atom naming convention to Tinker's allowed set of names. Let me know if you run across anything else.

jayponder commented 3 years ago

Done. Pushed to Tinker "release" branch on Github...

peastman commented 3 years ago

In case it's useful to you, here's the file where I keep track of all the strange, nonstandard atom and residue names I've come across. OpenMM uses it to translate them to standard ones when loading PDB files.

https://github.com/openmm/openmm/blob/master/wrappers/python/openmm/app/data/pdbNames.xml

jayponder commented 3 years ago

Yes, TInker has exactly the same thing... a routine called PDBFIX that gets called every time Tinker reads a raw PDB file, and whose job is to untangle residue and atom names and other weird stuff people and programs put into PDB files :) I'll take a look at yours and add any things we are missing as appropriate. Thanks.

peastman commented 3 years ago

Here's another case I ran into. In this file, atom 24 (H) doesn't get converted correctly. But its name is completely standard.

bpti2.pdb.txt

peastman commented 2 years ago

And here's another. This has two hydrogens whose positions get changed, one in the first residue and one in the last.

dna.pdb.txt

jayponder commented 2 years ago

Actually, in the BPTI example you posted the atom 24 should be "H1" and not "H". The "H" name is reserved for single amide hydrogens at residues other than the first residue of a chain. The ammonium-type hydrogens at the first residue should be "H1", "H2" and "H3", or just "H1" and "H2" if the first residue is PRO. I've never seen "H" used in that context... Do you know where that file originated?

I'll see if I can fix this specific case though...

jayponder commented 2 years ago

I've fixed both the BPTI and DNA examples you posted just above. The fix is pushed to the Github release branch.

Thanks! This has been very helpful. By all means send any other cases you find... Hopefully this may be most, if not all, of them.