nucleic acids residue charge is non-integer

dadaix commented 3 years ago

when running pdb2pqr30 (version 3.4.0) on my nucleic acid structure I get

CRITICAL:main.py:768:main_driver:Residue DG5 1 charge is non-integer: -0.3079 CRITICAL:main.py:769:main_driver:Giving up (when using --ff=AMBER)

test.log

sobolevnrm commented 3 years ago

Are you able to share the PDB for the nucleic acid structure that caused the problem?

dadaix commented 3 years ago

I cannot share the molecule I am working on currently. However, I also try to run it on an already published structure (PDB ID 6sx6) and I get the same error.

In case I use --ff=PARSE I get the same error but for DA 5 resiude.

stefdoerr commented 3 years ago

I've seen this occur as well but on AMBER waters. The bug (at least in my case) is that AMBER defines waters as:

HETATM 3415  OW  WAT A 809      17.388  20.916  27.945  1.00 29.88       1   O   
HETATM 3416  HW  WAT A 809      17.570  21.290  28.854  1.00  0.00       1   H   
HETATM 3417  HW  WAT A 809      18.115  21.249  27.345  1.00  0.00       1   H

See how the two hydrogen atoms have the same name. pdb2pqr seems to have an issue when creating the template DefinitionResidue where if it has two atoms in the definition file with the same name it will overwrite the first atom:

  <residue>
    <name>WAT</name>
    <atom>
      <name>O</name>
      <altname>OH2</altname>
      <altname>OW</altname>
      <altname>OH</altname>
      <x>3.240</x>
      <y>55.829</y>
      <z>19.243</z>
      <bond>H1</bond>
      <bond>H2</bond>
    </atom>
    <atom>
      <name>H1</name>
      <altname>HW</altname>
      <altname>HH1</altname>
      <altname>1H</altname>
      <x>2.865</x>
      <y>56.756</y>
      <z>19.243</z>
      <bond>O</bond>
    </atom>
    <atom>
      <name>H2</name>
      <altname>HW</altname>
      <altname>HH2</altname>
      <altname>2H</altname>
      <x>4.240</x>
      <y>55.829</y>
      <z>19.243</z>
      <bond>O</bond>
    </atom>
  </residue>

results in this object

WAT
Atoms: 
    O: 3.240 55.829 19.243 H1 H2
    H1: 2.865 56.756 19.243 O
    H2: 4.240 55.829 19.243 O
Dihedrals: 
Alternate naming map: 
    {'OH2': 'O', 'OW': 'O', 'OH': 'O', 'HW': 'H2', 'HH1': 'H1', '1H': 'H1', 'HH2': 'H2', '2H': 'H2'}

Here you can see how "HW" will always get mapped to "H2" while in reality the first HW should be mapped to H1 and the second HW to H2. Then it seems to drop the second hydrogen in the water, resulting in a water residue with an oxygen and a single hydrogen and then the charge does not sum up to 0 anymore as it should and you get the error. Something similar might happen in your nucleic acids.

sobolevnrm commented 3 years ago

Thanks, @stefdoerr !

poonmk commented 3 years ago

I would like to restart this issue because it appears to remain unresolved. My attempts to run DNA from any source generates the error: WARNING:Tetrahedral hydrogen reconstruction not available for nucleic acids. Some hydrogens may be missing (if so, this is a bug). This followed by the aforementioned non-integer error. For a simple example, try 1bna. Thanks for your help in advance with getting around this problem.

sobolevnrm commented 3 years ago

Thank you very much for providing this example. We are attempting to determine how difficult this will be to fix.

sobolevnrm commented 3 years ago

I've looked into this some more and it appears that PDB2PQR lacks the code to rebuild tetrahedral hydrogens for nucleic acids (that is the source of the warning message).

In particular, the Amino class has this method: https://github.com/Electrostatics/pdb2pqr/blob/9b71590bd845da0de8d540ee1ffeee714fb4a9e3/pdb2pqr/aa.py#L142 However, the Nucleic class does not: https://github.com/Electrostatics/pdb2pqr/blob/9b71590bd845da0de8d540ee1ffeee714fb4a9e3/pdb2pqr/na.py#L12

Therefore, PDB2PQR fails to generate the hydrogens for nucleic acid residues and generates the warning mentioned above in the process: https://github.com/Electrostatics/pdb2pqr/blob/f6aff8c2b7d5d4fd597ea216884a42859449bbc9/pdb2pqr/biomolecule.py#L360

This will be a very difficult bug to fix because the logic to generate the atom positions is complicated. In fact, it does not appear that PDB2PQR has supported this ever (or at least for a very long time) based on a review of the old repository: https://github.com/Electrostatics/apbs-pdb2pqr/blob/d8a4bee4b64df38cb5b856a3fe86e5586fe81cbe/pdb2pqr/pdb2pqr/na.py#L11

In short, this is a bug we should fix at some point but it will take a long time.

intendo commented 3 years ago

Is this something a grad student could research and mock up or do we have an algorithm for generating the atom positions?

sobolevnrm commented 3 years ago

I think the templates already exist in the dat directory; we just need to generate a nucleic acid version of the amino acid code we already have.

poonmk commented 3 years ago

Thanks for the update. I hope you are able to make progress soon.

sobolevnrm commented 2 years ago

@poonmk Can you please confirm that this issue is fixed in the most recent versions of PDB2PQR?

poonmk commented 2 years ago

Thanks for the update. I did a test with 1bna. PDB2PQR no longer aborts now, though it still throws a tsunami of WARNING:Tetrahedral hydrogen reconstruction not available for nucleic acids. Some hydrogens may be missing (if so, this is a bug). The generated input for APBS runs, and yields results that are reasonable (by casual inspection). Should we be concerned about the warnings?

sobolevnrm commented 2 years ago

I think you can ignore the warnings for now. I'll investigate in a separate issue. Thanks!

khelmy55 commented 2 years ago

i have got the same issue with a protein, PDB: 4yki i pre-treated it with pdb4amber and tleap, ran pdb2pqr from the command line and here is what i got:

CRITICAL:Biomolecue charge is non-integer: -32.34
CRITICAL:Giving up.

sobolevnrm commented 2 years ago

We can explore this in the new issue you created. Thank you.

Electrostatics / pdb2pqr

nucleic acids residue charge is non-integer #186