[BugReport] Titration state missing for N-ter/C-ter HIS

shaoqx commented 2 years ago

For a system that has HIS on N-ter or C-ter, PDB2PQR won't change their name to the corresponding HID/HIE. After several tests I found:

propka3 do not give pKa of N-ter HIS but do have that for C-ter HIS related issue
PDB2PQR will still optimize the H-bond network and give a protonation state (delta-H or epsilon-H) of N/C-ter HIS but the name is not changed correspondingly.

Additional Question:

Since no pKa was given PDB2PQR can't change the charge level of the protonation state in the N-ter sidechain. But can it do so for the C-ter?
Can this case be applied to general C/N-ter residues?

(Testing pdb attached share.zip )

stefdoerr commented 2 years ago

I can confirm that the issue is not only present for histidines but also lysines. Comparing the outputs of an older version you can see that the LYS was indeed a LYS (left with HZ3 atom) The newer version removes the HZ3 but does not rename to LYN as it should (if that is indeed the right protonation state). teLeap then proceeds to add the "missing" HZ3 atom since the name is LYS.

I think this is quite an important regression especially since HIS residues with HD1 atoms refuse to build with teLeap. LYN/LYS seems to silently reprotonate.

stefdoerr commented 2 years ago

I investigated a bit more the issue. The real issue is that propka predicts a 3.4 pKa for the terminal LYS. Which according to the chemists sitting next to me is very wrong :smile: . It is probably reporting the pKa of the c-terminal oxygens instead of the pKa of the lysine NZ nitrogen.

What happens then is that pdb2pqr deprotonates the lysine (removes the HZ3 atom) and renames it to CLYN. CLYN does not exist in AMBER and also gets renamed back to LYS by pdb2pqr (https://github.com/Electrostatics/pdb2pqr/blob/master/pdb2pqr/biomolecule.py#L760-L763)

I'll have to dig into propka I guess

stefdoerr commented 2 years ago

Actually no, the issue is in how pdb2pqr parses propka data. propka reports pka per group:

OrderedDict([('res_num', 13), ('ins_code', ' '), ('res_name', 'LYS'), ('chain_id', 'A'), ('group_label', 'LYS  13 A'), ('group_type', 'LYS'), ('pKa', 10.051129545410106), ('model_pKa', 10.5), ('buried', 0.0), ('coupled_group', None)]), 
OrderedDict([('res_num', 13), ('ins_code', ' '), ('res_name', 'LYS'), ('chain_id', 'A'), ('group_label', 'C-   13 A'), ('group_type', 'COO'), ('pKa', 3.394290453645872), ('model_pKa', 3.2), ('buried', 0.0), ('coupled_group', None)])

It reports two pKas for the terminal LYS, one for the sidechain and one for the COO group. The way that pdb2pqr parses that data, it overwrites the sidechain pKa with the COO pKa

ipdb> print({f"{row['res_name']} {row['res_num']} {row['chain_id']}": row["pKa"] for row in rows})
{'LYS 13 A': 3.394290453645872}

If you tell it to ignore the COO groups it works fine

ipdb> print({f"{row['res_name']} {row['res_num']} {row['chain_id']}": row["pKa"] for row in rows if row["group_type"] != "COO"})
{'LYS 13 A': 10.051129545410106}

I'm trying to think up of a more general solution though. Might make sense to instead check if "group_type" matches "res_name" maybe?

shaoqx commented 2 years ago

Ohh I see. This is why propka only have the N-ter pKa missing but pdb2pqr error naming both N-ter and C-ter. Cool!

sobolevnrm commented 2 years ago

Great find, @stefdoerr -- thank you!

stefdoerr commented 2 years ago

Actually seems like we only fixed half the problem of this issue. I solved the wrong pka parsing but even with correct pka parsing the terminal aminoacids are not renamed correctly. There are cases with histidine where it can indeed change to another protonation state at the terminal.

For example this terminal Histidine is protonated as HID but name stays HIS which causes tleap to throw an error. bugged_HIS209.zip

Electrostatics / pdb2pqr

[BugReport] Titration state missing for N-ter/C-ter HIS #245