biotite-dev / hydride

Adding hydrogens to molecular models
https://hydride.biotite-python.org/
BSD 3-Clause "New" or "Revised" License
34 stars 5 forks source link

Wrong hydrogen atoms placed with ACE group #6

Closed JoStoe closed 1 year ago

JoStoe commented 1 year ago

Dear hydride developers!

I want to thank you for developing this great tool. I like it a lot.

I noticed a problem when adding hydrogen atoms to a polypeptide containing ACE and NME groups when using a *.pdb file-format as input.

Issue: The protonated output is, next to ACE group, chemically wrong.

I used the additional flag "--charges 7.0", but the issue also exists without the flag. Do you have any ideas on how to work around this issue?

Best regards!

Input files: ace-ala7-nme_exH.zip

padix-key commented 1 year ago

Hi, thanks for reporting this.

At first the --charges options should not have an effect here, since this option (and the underlying estimate_amino_acid_charges()) assigns charges only to canonical amino acids. Probably this behavior should be clarified in the documentation. To set protonation for other residues, the charge need to be set manually. Otherwise a neutral charge is assumed.

Regarding the issue of wrongly assigned hydrogen atoms, I will have a look into this problem.

JoStoe commented 1 year ago

Thank you for your feedback!

Regarding the problem with the protonation, I want to share additional information that I found out.

In the .zip that I provided, there is also a .mol file of the same test-case as the .pdb. If I add the hydrogen atoms to the structure, loaded from the .mol, the carbonyl group of the ACE residue gets reduced to an alcohol. I found out, that RDkit wrote a bond-order of 1 into the .mol file by mistake, so the result from the Hydride package with the (wrong) .mol file seems to be correct. Also, if I change the bond-order to 2, the protonation yields the expected result.

With the PDB-file, I expect that the problem comes from bonding estimation. I assume that the Bond between Residue 1 (ACE) and 2 (ALA) is missed. This leads to additional Hydrogens, where instead the N-C bond should be.

Thank you for your support.

padix-key commented 1 year ago

PDB files unfortunately do not provide information about bond order. Since this information is vital for hydrogen placement, Hydride uses Biotite's connect_via_residue_names() to estimate these bonds and ditches the bonds from the PDB file. Although this is quite easy for intra-residue bonds using information from the Chemical Component Dictionary (CCD), this is difficult for bonds between residues: The algorithm creates bonds between two consecutive residues, if they are either peptide- or nucleotide-linking (again using the CCD). However, NME and ACE are both non-polymer, so the algorithm expects no bond between them and the peptide chain.

However, a bond between NME and the peptide chain is still formed. I investigated this and found a small bug in connect_via_residue_names(), so thank you for that!

Now to the most relevant part: How to fix this problem for you application. Via the CLI this unfortunately impossible at the moment, if you want to use a PDB file as input. However, if you would like to use Python instead, the following approach should fix the problem for the current Hydride version.

import biotite.structure.io.pdb as pdb
import biotite.structure as struc
import hydride

atoms = pdb.PDBFile.read("ace-ala7-nme_exH.pdb").get_structure(
    extra_fields=["charge"], model=1, include_bonds=True
)

# Expect that all ANY bonds are actually SINGLE bonds
bond_array = atoms.bonds.as_array()
bond_array[:,2] = struc.BondType.SINGLE
atoms.bonds = struc.BondList(atoms.array_length(), bond_array)
# Update bond order information
# with the reliable values from 'connect_via_residue_names()'
atoms.bonds = struc.connect_via_residue_names(atoms).merge(atoms.bonds)

# Actual hydrogen addition
atoms, _ = hydride.add_hydrogen(atoms)
atoms.coord = hydride.relax_hydrogen(atoms)

# Write atoms into new PDB file
pdb_file = pdb.PDBFile()
pdb.set_structure(pdb_file, atoms)
pdb_file.write("out.pdb")

### OPTIONAL: Visualization ###
import ammolite
ammolite.launch_interactive_pymol()
pymol_object = ammolite.PyMOLObject.from_structure(atoms)
pymol_object.show_as("sticks")

out

padix-key commented 1 year ago

The workaround posted above should be obsolete in the upcoming Hydride version due to the changes in #8.

JoStoe commented 1 year ago

The output looks very promising and correct now. Thank you for your help!