OpenFreeEnergy / gufe

grand unified free energy by OpenFE
https://gufe.readthedocs.io
MIT License
28 stars 7 forks source link

Allow for storing & retreiving PDBResidueInfo for SMCs #327

Open IAlibay opened 4 weeks ago

IAlibay commented 4 weeks ago

Background

We live in a residue-orientated world. This means that some types of PDB information can be useful to track for small molecules.

These include:

  1. Some type of residue name
  2. Some type of chain ID
  3. Maybe some type of residue number (although this is a bit hard to keep when we tend to shuffle residues around)

OpenFF & RDKit handle residue information

It is possible to set atom-wise metadata via the OFF tk, including residue names (see: https://docs.openforcefield.org/projects/toolkit/en/stable/users/molecule_conversion.html#hierarchy-data-chains-and-residues).

In the RDKit tk (and how things are converted from OFF Tk to RDKit), this is handled via PDBResidueInfo, which is set as an atom-wise property.

When this information is set, it is possible to create an OFFMol & by consequence OpenMM Topology that retains this PDB information.

GUFE's serialization does not preserve this information

Because PDBResidueInfo isn't a property, this information isn't preserved when we call to_dict and from_dict, so it gets lost before we can use it within Protocols.

Proposal

We add the storage & retrieval of some PDBResidueInfo data when calling to_dict and from_dict.

Specifically, my proposal would be to include; residue name, residue ID, and chain ID, as per: https://github.com/openforcefield/openff-toolkit/blob/dcd78c97a161b522665d7a4c74ea1eefc5ec2ceb/openff/toolkit/utils/rdkit_wrapper.py#L862-L865