biotite-dev / hydride

Adding hydrogens to molecular models
https://hydride.biotite-python.org/
BSD 3-Clause "New" or "Revised" License
33 stars 5 forks source link

Two hydrogen atoms with the same atom name appear in the same residue #12

Open padix-key opened 1 month ago

padix-key commented 1 month ago

In case a residue to be hydrogenated has an additional heavy atom or a missing heavy atom, AtomNameLibrary.generate_hydrogen_names() does not ensure unique hydrogen atom names.

Example:

import biotite.structure.info as info
import hydride

atoms = info.residue("ALA")
# Remove a heavy atom to enforce an unsual hydrogenation at its position instead
atoms = atoms[atoms.atom_name != "OXT"]

atoms = atoms[atoms.element != "H"]
hydrogenated_atoms, _ = hydride.add_hydrogen(atoms)
print(hydrogenated_atoms)

Output:

            0  ALA N      N        -0.970    0.490    1.500
            0  ALA CA     C         0.260    0.420    0.690
            0  ALA C      C        -0.090    0.020   -0.720
            0  ALA O      O        -1.060   -0.680   -0.920
            0  ALA CB     C         1.200   -0.620    1.300
            0  ALA H      H        -0.969    1.384    1.958
            0  ALA H2     H        -1.744    0.507    0.845
            0  ALA HA     H         0.741    1.396    0.720
            0  ALA H      H         0.544    0.376   -1.518    # This atom replaces `OXT` and has the duplicate name
            0  ALA HB1    H         1.482   -0.342    2.316
            0  ALA HB2    H         2.099   -0.657    0.682
            0  ALA HB3    H         0.735   -1.599    1.314

This bug should only appear rarely, as residues with missing/additional heavy atoms usually do not make sense in the first place. Still the hydrogen atom names should be unique per residue.

dargen3 commented 2 weeks ago

Hello,

thank you for hydride! It's nice to work with! I would like to report that I've encountered the same problem with heteroresidue. I protonated the structure 4AOC with the command:

hydride --infile 4aoc.cif --outfile 4aoc_protonated.cif -v

and it resulted in:

HETATM C C1 . A1Q B 2 1131 . 1131 A1Q B C1 ? -6.216 22.324 -7.128 1 8972 HETATM O O7 . A1Q B 2 1131 . 1131 A1Q B O7 ? -1.874 23.118 -8.327 1 8973 HETATM O O1 . A1Q B 2 1131 . 1131 A1Q B O1 ? -5.669 21.024 -6.804 1 8974 HETATM C C2 . A1Q B 2 1131 . 1131 A1Q B C2 ? -7.511 22.139 -7.944 1 8975 HETATM O O2 . A1Q B 2 1131 . 1131 A1Q B O2 ? -8.323 23.32 -7.857 1 8976 HETATM C C3 . A1Q B 2 1131 . 1131 A1Q B C3 ? -7.281 21.847 -9.428 1 8977 HETATM O O3 . A1Q B 2 1131 . 1131 A1Q B O3 ? -8.484 21.936 -10.214 1 8978 HETATM C C4 . A1Q B 2 1131 . 1131 A1Q B C4 ? -6.282 22.863 -9.946 1 8979 HETATM O O4 . A1Q B 2 1131 . 1131 A1Q B O4 ? -6.068 22.711 -11.359 1 8980 HETATM C C5 . A1Q B 2 1131 . 1131 A1Q B C5 ? -5.041 22.632 -9.113 1 8981 HETATM O O5 . A1Q B 2 1131 . 1131 A1Q B O5 ? -5.268 23.164 -7.815 1 8982 HETATM C C6 . A1Q B 2 1131 . 1131 A1Q B C6 ? -3.803 23.213 -9.74 1 8983 HETATM O O6 . A1Q B 2 1131 . 1131 A1Q B O6 ? -3.701 24.613 -9.428 1 8984 HETATM C C7 . A1Q B 2 1131 . 1131 A1Q B C7 ? -2.609 22.364 -9.287 1 8985 HETATM C C8 . A1Q B 2 1131 . 1131 A1Q B C8 ? -4.444 21.05 -6.054 1 8986 HETATM H H1 . A1Q B 2 1131 . 1131 A1Q B H1 ? -6.4262753 22.830936 -6.179991 1 8987 HETATM H H7 . A1Q B 2 1131 . 1131 A1Q B H7 ? -1.3125027 22.492907 -7.8362956 1 8988 HETATM H H2 . A1Q B 2 1131 . 1131 A1Q B H2 ? -8.026652 21.265627 -7.5301847 1 8989 HETATM H H2 . A1Q B 2 1131 . 1131 A1Q B H2 ? -8.438941 23.513788 -6.910527 1 8990 HETATM H H3 . A1Q B 2 1131 . 1131 A1Q B H3 ? -6.827121 20.853497 -9.51085 1 8991 HETATM H H3 . A1Q B 2 1131 . 1131 A1Q B H3 ? -9.228026 21.71413 -9.627479 1 8992 HETATM H H4 . A1Q B 2 1131 . 1131 A1Q B H4 ? -6.6487293 23.869776 -9.7183275 1 8993 HETATM H H4 . A1Q B 2 1131 . 1131 A1Q B H4 ? -5.9327846 23.59531 -11.741785 1 8994 HETATM H H5 . A1Q B 2 1131 . 1131 A1Q B H5 ? -4.873869 21.55034 -9.068252 1 8995 HETATM H H6 . A1Q B 2 1131 . 1131 A1Q B H6 ? -3.878452 23.07668 -10.824271 1 8996 HETATM H H6 . A1Q B 2 1131 . 1131 A1Q B H6 ? -4.5722694 25.021593 -9.572047 1 8997 HETATM H H7 . A1Q B 2 1131 . 1131 A1Q B H7 ? -1.9559722 22.1669 -10.145719 1 8998 HETATM H H7A . A1Q B 2 1131 . 1131 A1Q B H7A ? -2.9551985 21.40397 -8.8914585 1 8999 HETATM H H8 . A1Q B 2 1131 . 1131 A1Q B H8 ? -3.6677785 21.56298 -6.619452 1 9000 HETATM H H8A . A1Q B 2 1131 . 1131 A1Q B H8A ? -4.130931 20.024744 -5.859006 1 9001 HETATM H H8B . A1Q B 2 1131 . 1131 A1Q B H8B ? -4.5953755 21.559626 -5.10281 1 9002

It's one of the first proteins with heteroresidue that I protonated, so it's probably a more common problem.

padix-key commented 2 weeks ago

I can confirm the problem, thanks. The reason is that the residue contains both C2 and O2, which both gets H2 as name assigned. So, you are probably right, that this problem may appear quite commonly. To fix this, AtomNameLibrary.get_hydrogen_names() needs to be updated to blacklist already used names.

dargen3 commented 1 week ago

Hello,

I would like to respectfully ask if and possibly in what timeframe a fix for this issue and also [#14] is planned? I would like to use hydride for a PDB database related project and they are asking me for a date when it will be ready. If you don't have enough time for that, I can try to help with this issue. Thank you for your reply!

padix-key commented 1 week ago

I assume I will fix them within the next two weeks. Is this sufficient?

dargen3 commented 1 week ago

That's perfectly sufficient. Thank you!