Closed shaoqx closed 2 years ago
I can confirm that the issue is not only present for histidines but also lysines.
Comparing the outputs of an older version you can see that the LYS was indeed a LYS (left with HZ3 atom)
The newer version removes the HZ3 but does not rename to LYN as it should (if that is indeed the right protonation state).
teLeap
then proceeds to add the "missing" HZ3 atom since the name is LYS.
I think this is quite an important regression especially since HIS residues with HD1 atoms refuse to build with teLeap
. LYN/LYS seems to silently reprotonate.
I investigated a bit more the issue. The real issue is that propka
predicts a 3.4 pKa for the terminal LYS. Which according to the chemists sitting next to me is very wrong :smile: . It is probably reporting the pKa of the c-terminal oxygens instead of the pKa of the lysine NZ nitrogen.
What happens then is that pdb2pqr
deprotonates the lysine (removes the HZ3 atom) and renames it to CLYN
. CLYN
does not exist in AMBER and also gets renamed back to LYS by pdb2pqr
(https://github.com/Electrostatics/pdb2pqr/blob/master/pdb2pqr/biomolecule.py#L760-L763)
I'll have to dig into propka
I guess
Actually no, the issue is in how pdb2pqr parses propka data. propka reports pka per group:
OrderedDict([('res_num', 13), ('ins_code', ' '), ('res_name', 'LYS'), ('chain_id', 'A'), ('group_label', 'LYS 13 A'), ('group_type', 'LYS'), ('pKa', 10.051129545410106), ('model_pKa', 10.5), ('buried', 0.0), ('coupled_group', None)]),
OrderedDict([('res_num', 13), ('ins_code', ' '), ('res_name', 'LYS'), ('chain_id', 'A'), ('group_label', 'C- 13 A'), ('group_type', 'COO'), ('pKa', 3.394290453645872), ('model_pKa', 3.2), ('buried', 0.0), ('coupled_group', None)])
It reports two pKas for the terminal LYS, one for the sidechain and one for the COO group. The way that pdb2pqr parses that data, it overwrites the sidechain pKa with the COO pKa
ipdb> print({f"{row['res_name']} {row['res_num']} {row['chain_id']}": row["pKa"] for row in rows})
{'LYS 13 A': 3.394290453645872}
If you tell it to ignore the COO groups it works fine
ipdb> print({f"{row['res_name']} {row['res_num']} {row['chain_id']}": row["pKa"] for row in rows if row["group_type"] != "COO"})
{'LYS 13 A': 10.051129545410106}
I'm trying to think up of a more general solution though. Might make sense to instead check if "group_type" matches "res_name" maybe?
Ohh I see. This is why propka only have the N-ter pKa missing but pdb2pqr error naming both N-ter and C-ter. Cool!
Great find, @stefdoerr -- thank you!
Actually seems like we only fixed half the problem of this issue. I solved the wrong pka parsing but even with correct pka parsing the terminal aminoacids are not renamed correctly. There are cases with histidine where it can indeed change to another protonation state at the terminal.
For example this terminal Histidine is protonated as HID but name stays HIS which causes tleap to throw an error. bugged_HIS209.zip
For a system that has HIS on N-ter or C-ter, PDB2PQR won't change their name to the corresponding HID/HIE. After several tests I found:
Additional Question:
(Testing pdb attached share.zip )