Recently I'm faced with a disastrous bug when protonating the protein-ligand complex, where the protonated ligand area will become a mess in the protonated complex.
After testing, it's found that if the original ligand is without H, this bug will not be caused; however, if the original ligand is with H, this bug will probably happen, where the output PDB file is misaligned in its mapping between atom names and atom types.
COMPND .../bin/EnzyHTP/test/preparation/data/ligand_test_4WI.pdb
AUTHOR GENERATED BY OPEN BABEL 3.1.0
ATOM 1 C5 4WI B 307 37.258 63.990 48.016 1.00 0.00 C
ATOM 2 H1 4WI B 307 37.779 64.177 46.592 1.00 0.00 C
ATOM 3 H2 4WI B 307 38.256 62.831 46.314 1.00 0.00 N
ATOM 4 C6 4WI B 307 38.594 62.128 47.436 1.00 0.00 C
ATOM 5 H3 4WI B 307 39.053 61.008 47.452 1.00 0.00 O
ATOM 6 H4 4WI B 307 38.170 62.938 48.648 1.00 0.00 C
ATOM 7 N1 4WI B 307 37.442 62.096 49.719 1.00 0.00 C
ATOM 8 H31 4WI B 307 37.057 62.954 50.940 1.00 0.00 C
...
I looked into the preparation.protonate.pybel_protonate_pdb_ligand function (which is to protonate a ligand with pybel), noticing that when preparation.protonate._fix_pybel_output is called by preparation.protonate.pybel_protonate_pdb_ligand, the path of original ligand file is passed via its formal parameter ref_name_path.
# PYBEL interface
def pybel_protonate_pdb_ligand(in_path: str, out_path: str, ph: float = 7.0) -> str:
"""
This is a wrapper of ...
Args:
in_path: path of input ligand pdb for protonation
out_path: path of output result protonated ligand pdb
ph: target pH
Return:
(write file to out_path)
return the {out_path} from input
"""
int_path = fs.get_valid_temp_name(out_path.removesuffix(".pdb") + "_badname.pdb")
pybel.ob.obErrorLog.SetOutputLevel(0)
mol = next(pybel.readfile("pdb", in_path))
mol.OBMol.AddHydrogens(False, True, ph)
mol.write("pdb", int_path, overwrite=True)
# fix atom label and residue name
_fix_pybel_output(int_path, out_path, in_path)
fs.clean_temp_file_n_dir([int_path])
return out_path
When pybel is performing the protonation in the step mol.OBMol.AddHydrogens(False, True, ph) on a ligand containing hydrogen atoms, all the existing hydrogen atoms will be removed first, and then re-add hydrogen atoms, written after the heavy atoms in the PDB file, without deduplicating hydrogen atom names (or all atom names in some earlier versions), where a _badname.pdb file will be generated as an intermediate file.
COMPND .../bin/EnzyHTP/test/preparation/data/ligand_test_4WI.pdb
AUTHOR GENERATED BY OPEN BABEL 3.1.0
ATOM 1 C5 4WI B 307 37.258 63.990 48.016 1.00 0.00 C
ATOM 2 C6 4WI B 307 37.779 64.177 46.592 1.00 0.00 C
ATOM 3 N1 4WI B 307 38.256 62.831 46.314 1.00 0.00 N
ATOM 4 C 4WI B 307 38.594 62.128 47.436 1.00 0.00 C
ATOM 5 O 4WI B 307 39.053 61.008 47.452 1.00 0.00 O
...
ATOM 33 F1 4WI B 307 47.032 65.486 51.289 1.00 0.00 F
ATOM 34 F2 4WI B 307 46.806 65.611 49.127 1.00 0.00 F
ATOM 35 F 4WI B 307 45.771 64.061 50.247 1.00 0.00 F
ATOM 36 H 4WI B 307 36.245 63.647 48.002 1.00 0.00 H
ATOM 37 H 4WI B 307 37.268 64.907 48.567 1.00 0.00 H
ATOM 38 H 4WI B 307 36.999 64.468 45.920 1.00 0.00 H
ATOM 39 H 4WI B 307 38.515 64.946 46.480 1.00 0.00 H
ATOM 40 H 4WI B 307 38.328 62.462 45.404 1.00 0.00 H
...
Thus, the _fix_pybel_output function is introduced to assign unique name to each atom. When None is passed to its formal parameter ref_name_path, the atoms will be assigned in the default order of names; otherwise, the atom names in the PDB file are assigned to the newly generated atoms based on a mapping with the atom names in the original ligand file and the atom names in the new file (_badname.pdb).
Two candidate solutions are proposed here:
The new version of pybel (openbabel=3.1.1=py39h421517d_8 in conda) does not change the atomic name of the heavy atom when protonating the ligand, so I tested calling the _fix_pybel_output function with the formal parameter ref_name_path set to None, the protonated output is not abnormal, regardless of whether the original ligand contains hydrogen or not. Therefore, it is possible to consider not using the original ligand file as a template for fixing the atom name when installing this Pybel version as a dependency.
An alternative strategy is also desirable if compatibility issues and the unpredictability of Pybel's behavior in future versions are taken into account:
Make it mandatory for ligands containing hydrogen atoms to undergo a dehydrogenation step before being protonated by Pybel, thus allowing us to bypass the mismatch problem that occurs when the _fix_pybel_output function uses the original ligand file containing the hydrogen atoms as a template for fixing the atom name.
Dear colleagues,
Recently I'm faced with a disastrous bug when protonating the protein-ligand complex, where the protonated ligand area will become a mess in the protonated complex.
After testing, it's found that if the original ligand is without H, this bug will not be caused; however, if the original ligand is with H, this bug will probably happen, where the output PDB file is misaligned in its mapping between atom names and atom types.
I looked into the
preparation.protonate.pybel_protonate_pdb_ligand
function (which is to protonate a ligand with pybel), noticing that whenpreparation.protonate._fix_pybel_output
is called bypreparation.protonate.pybel_protonate_pdb_ligand
, the path of original ligand file is passed via its formal parameterref_name_path
.When pybel is performing the protonation in the step
mol.OBMol.AddHydrogens(False, True, ph)
on a ligand containing hydrogen atoms, all the existing hydrogen atoms will be removed first, and then re-add hydrogen atoms, written after the heavy atoms in the PDB file, without deduplicating hydrogen atom names (or all atom names in some earlier versions), where a_badname.pdb
file will be generated as an intermediate file.Thus, the
_fix_pybel_output
function is introduced to assign unique name to each atom. WhenNone
is passed to its formal parameterref_name_path
, the atoms will be assigned in the default order of names; otherwise, the atom names in the PDB file are assigned to the newly generated atoms based on a mapping with the atom names in the original ligand file and the atom names in the new file (_badname.pdb
).Two candidate solutions are proposed here:
_fix_pybel_output
function with the formal parameterref_name_path
set to None, the protonated output is not abnormal, regardless of whether the original ligand contains hydrogen or not. Therefore, it is possible to consider not using the original ligand file as a template for fixing the atom name when installing this Pybel version as a dependency.An alternative strategy is also desirable if compatibility issues and the unpredictability of Pybel's behavior in future versions are taken into account:
_fix_pybel_output
function uses the original ligand file containing the hydrogen atoms as a template for fixing the atom name.