Open polo9719 opened 8 months ago
FYI I added this pre-processing script to fix the issue
import argparse
from Bio.PDB import PDBParser, PDBIO
# Define a mapping based on your table
residue_renaming_map = {
'HID': 'HIS',
'HIE': 'HIS',
'HIP': 'HIS',
'GLH': 'GLU',
'ASH': 'ASP',
'CYM': 'CYS',
'CYX': 'CYS',
'LYN': 'LYS',
}
def rename_residues(input_filename, output_filename):
parser = PDBParser()
structure = parser.get_structure("structure", input_filename)
for model in structure:
for chain in model:
for residue in chain:
# Get the standard residue name if it needs to be renamed
standard_res_name = residue_renaming_map.get(residue.get_resname())
if standard_res_name:
residue.resname = standard_res_name
# Handle N-terminal and C-terminal residues
elif residue.get_resname().startswith("N"):
residue.resname = residue.get_resname()[1:]
elif residue.get_resname().endswith("C"):
residue.resname = residue.get_resname()[:-1]
io = PDBIO()
io.set_structure(structure)
io.save(output_filename)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("input_file", type=str)
parser.add_argument("output_file", type=str)
args = parser.parse_args()
rename_residues(
args.input_file,
args.output_file
)
Thank you, you solved my problem!!! A thousand thanks!
Amber can give different names to histidine amino acid by examining which protons are present : HID, HIE, or HIP instead of HIS.
This raises an issue when featurizing the protein in Diffdock because those residues are matched to the one letter name
X
instead ofH
.https://github.com/gcorso/DiffDock/blob/d3791a885e504ea7d7c3587951e259e338e4808b/datasets/constants.py#L3
It can be easily fixed by modifying all HID, HIE and HIP by HIS. Is it a good way to fix it ? If it is the case, may be it could be done automatically in the inference code. Otherwise, is there a way to read the PDB file that takes into account those variants of amino acids ?
PS-1 : When running DiffDock v1 on the same protein, everything is running fine. That's why I suspect the match of those modified histidines to
X
coming from the new package Prody.PS-2 : I had this issue specifically with histidine, but may be it also happens with others amino acids ?