choderalab / perses

Experiments with expanded ensembles to explore chemical space
http://perses.readthedocs.io
MIT License
179 stars 50 forks source link

Improved reporting and state detection when passing tautomers as `old_residue` #1246

Open sukritsingh opened 3 months ago

sukritsingh commented 3 months ago

This is writing down the results and outcomes of a discussed issue that was solved with @ijpulidos

TLDR: When mutating a non-standard amino acid, or a tautomer, Perses could do a better job logging what it has inferred old_residue to be based on the atom types, as well as reporting that the amino acid misreporting is causing the crash.

In our issue, when we passed a new protein to relative_point_mutation_setup.py - we passed HIS as the identity of the old_residue argument. It turned out later, upon manual inspection, that the old_residue should have been receiving HID which we only realized upon manual inspection of the amino acid.

However, this wasn't clearly made the issue because the crash log repeatedly stated:

INFO:proposal_generator:    Conducting polymer point mutation proposal...
INFO:proposal_generator:local_atom_map: {402: 402, 403: 403, 404: 404, 405: 405, 406: 406, 408: 412, 409: 413}
INFO:proposal_generator:the mapped atom names are: [('N', 'N'), ('CA', 'CA'), ('C', 'C'), ('O', 'O'), ('CB', 'CB'), ('H', 'H'), ('HA', 'HA')]
Changed resid 28 to HIS
Traceback (most recent call last):
  File "scripts/setup_mutation_htf.py", line 126, in <module>
    generate_rest_capable_hybrid_topology_factory=args.generate_rest_capable_htf,
  File "/home/singhs15/miniconda3/envs/perses/lib/python3.7/site-packages/perses/app/relative_point_mutation_setup.py", line 279, in __init__
    topology_proposal = point_mutation_engine.propose(sys, top, extra_sidechain_map=extra_sidechain_map, demap_CBs=demap_CBs)
  File "/home/singhs15/miniconda3/envs/perses/lib/python3.7/site-packages/perses/rjmc/topology_proposal.py", line 680, in propose
    atom_map, old_res_to_oemol_map, new_res_to_oemol_map, old_oemol_res, new_oemol_res  = self._construct_atom_map(residue_map, old_topology, new_topology, extra_sidechain_map=extra_sidechain_map, demap_CBs=demap_CBs)
  File "/home/singhs15/miniconda3/envs/perses/lib/python3.7/site-packages/perses/rjmc/topology_proposal.py", line 1234, in _construct_atom_map
    old_res_to_oemol_map = {atom.index: old_oemol.GetAtom(oechem.OEHasAtomName(atom.name)).GetIdx() for atom in old_res.atoms()}
  File "/home/singhs15/miniconda3/envs/perses/lib/python3.7/site-packages/perses/rjmc/topology_proposal.py", line 1234, in <dictcomp>
    old_res_to_oemol_map = {atom.index: old_oemol.GetAtom(oechem.OEHasAtomName(atom.name)).GetIdx() for atom in old_res.atoms()}
AttributeError: 'NoneType' object has no attribute 'GetIdx'

This issue entirely went away when HID was specified as old_residue.

The Changed resid 28 to HIS could be better labeled to specify that it is inferring the old_residue as HIS. In other words, it should detect that you are mutating a HIS and detect/report a state mismatch issue, instead of the unhelpful Nonetype crash from oemol

This issue only cropped up if the HID was being mutated. Using the same input structure with HID not as the old residue works fine.

Input file is provided here as a sample: example-pdb-hid-input.tar.gz