Hydrogen addition problems with non-experimental input structures

dargen3 commented 2 years ago

Hello, I would like to report a strange behavior of the software tool pdb2pqr30.

I am using pdb2pqr30 to protonate a structure with a command: pdb2pqr30 --log-level DEBUG --with-ph 7.2 AF-P0DSE4-F1-model_v2.pdb AF-P0DSE4-F1-model_v2_protonated.pqr The version of pdb2pqr30 is: 3.4.1 OS is Ubuntu 21.10

But the structure is protonated probably wrong.

Please help me if I am doing something wrong. If it is a software error, can be the error fixed, please? Alternatively, is there any estimate as to when the error might be fixed? PDB file can be downloaded from https://alphafold.ebi.ac.uk/entry/P0DSE4

Thank you. Regards, Schindler

sobolevnrm commented 2 years ago

Can you please tell us what specific aspect of protonation state is incorrect?

intendo commented 2 years ago

Is it the 3 bonds on the lower left atom that is red or the atoms that are not connected in the lower middle of the image?

dargen3 commented 2 years ago

Wow, this is really fast reaction! :) It should be standard serine. So, carbon (grey color) should have 2 hydrogens instead of 3. Moreover distance between carbon and hydrogen is too small (0.65A)

Screenshot from 2022-02-18 21-18-40

sobolevnrm commented 2 years ago

This seems like it could more likely be a visualization issue rather than an issue with PDB2PQR. The PQR files that PDB2PQR produces have no information about bonding. However, the visualization above is drawing odd bonds between atoms--this sometimes happens in programs such as VMD when the bonding information is inferred from the radii in the PQR file (rather than the built-in radii used for PDB files).

When I visualize the results of the calculation above with PyMOL, I get the following images which do not show the bonding issues in the issue above.

issue_304 issue_304a

dargen3 commented 2 years ago

OK, missing bonds are probably error of Avogadro. But unresolved problem is, that carbon of serine has 3 hydrogens instead of 2. You can see it even in your picture from pymol too.

dargen3 commented 2 years ago

Any progress here, please?

sobolevnrm commented 2 years ago

No, I haven't had time to work on this. Sorry.

dargen3 commented 2 years ago

And do you have any idea if you'll ever have time to do that? We are planning to use pdb2pqr for a large Alphafold2 related project and I don't know whether to wait for the bug fix or find another tool please?

sobolevnrm commented 2 years ago

I will try to work on it this weekend. Sorry.

sobolevnrm commented 2 years ago

But unresolved problem is, that carbon of serine has 3 hydrogens instead of 2. You can see it even in your picture from pymol too.

This is not an extra atom -- its the alpha carbon hydrogen rotated the wrong way. There's a problem with the input structure that is affecting hydrogen optimization; e.g., see the error messages generated by PDB2PQR:

2022-04-02 08:03:53,582 DEBUG:debump.py:200:find_residue_conflicts:SER A 12 HA is too close to SER A 12 CB 2022-04-02 08:03:53,583 DEBUG:debump.py:161:debump_biomolecule:Starting to debump SER A 12... 2022-04-02 08:03:53,584 DEBUG:debump.py:162:debump_biomolecule:Debumping cutoffs: 2.0 for heavy-heavy, 1.5 for hydrogen-heavy, and 1.0 for hydrogen-hydrogen. 2022-04-02 08:03:53,584 WARNING:debump.py:172:debump_biomolecule:WARNING: Unable to debump SER A 12

I've never seen an debumping issue like this before with experimentally derived structures which is why I suspect it is a problem with the input file. This will take a while to debug.

dargen3 commented 2 years ago

This is not an extra atom -- its the alpha carbon hydrogen rotated the wrong way. There's a problem with the input structure that is affecting hydrogen optimization; e.g., see the error messages generated by PDB2PQR:

OK, that is my mistake. You are right. Thank you for error messages.

I've never seen an debumping issue like this before with experimentally derived structures which is why I suspect it is a problem with the input file. This will take a while to debug.

Do you want more problematic structures for debugging? Please let me know if you find a bug in the structure so I can inform the Alphafold developers.

sobolevnrm commented 2 years ago

If you can share some additional structures here, that would be helpful. Thank you.

HankewieDanke commented 1 year ago

Just for documentation purposes, I have encountered this bug with other nn-predictors similar to Alphafold as well (ABodyBuilder2 -- specifically the nanobody predictor). Sadly I can not share the structures generated.

dargen3 commented 1 year ago

I am sending UniProt codes to 10 more problematic structures from AlphaFold DB. UniProt, pH, problematic atom index P56641, 12.3, 13 A0A1Z1CH22, 10.9, 23 Q3SAF8, 10.4, 405 J3QJY3, 13.7, 467 B3H610, 8.8, 354 Q38F30, 11.1, 391 F6YG85, 3.0, 82 P0DSE4, 11.4, 190 J3QJY3, 3.9, 475 B3H610, 3.3, 357

All structures can by downloaded as https://alphafold.ebi.ac.uk/files/AF-{UniProt}-F1-model_v4.pdb

All structures were protonated by command: pdb2pqr30 --log-level DEBUG --noopt --titration-state-method propka --with-ph <ph> --pdb-output <pdb_output> <pdb_input> <pdb_output>

sobolevnrm commented 1 year ago

OK, missing bonds are probably error of Avogadro. But unresolved problem is, that carbon of serine has 3 hydrogens instead of 2. You can see it even in your picture from pymol too.

Can you please share the PQR files for this or the other structures that are having problems?

dargen3 commented 1 year ago

structures.zip There are 10 mentioned pqr files in the zip file. If you need to send more, let me know.

dargen3 commented 1 year ago

Hello,

can I expect some progress, please? Should i send more structures with errors? I plan to use pdb2pqr in a publication on predicted structures I will write during the fall.

sobolevnrm commented 1 year ago

This code is only supported by volunteer effort. Progress is based on the time those volunteers have available.

kekasz commented 1 year ago

Hello ! I too have been using pdb2pqr30 (v3.6.1) to protonate proteins. Having found this issue thread, now I see I am not the only one encountering this behaviour.

Here is the command I used: pdb2pqr30 --noopt --nodebump --pdb-output <pdb-output> <output> <input> --titration-state-method propka --with-ph 7.2 As you can see, all proteins' pH is set to 7.2.

Here are some few pqr files as exemples, each with the problematic atom number at the end of its name: proteins.zip If you need more pqr files, I am eager to provided you with them.

I will readily appreciate your effort to address this malfunction as I would really like to use specifically your tool for a paper which should be finished shortly.

Electrostatics / pdb2pqr

Hydrogen addition problems with non-experimental input structures #304