In the RDKit tk (and how things are converted from OFF Tk to RDKit), this is handled via PDBResidueInfo, which is set as an atom-wise property.
When this information is set, it is possible to create an OFFMol & by consequence OpenMM Topology that retains this PDB information.
GUFE's serialization does not preserve this information
Because PDBResidueInfo isn't a property, this information isn't preserved when we call to_dict and from_dict, so it gets lost before we can use it within Protocols.
Proposal
We add the storage & retrieval of some PDBResidueInfo data when calling to_dict and from_dict.
Background
We live in a residue-orientated world. This means that some types of PDB information can be useful to track for small molecules.
These include:
OpenFF & RDKit handle residue information
It is possible to set atom-wise metadata via the OFF tk, including residue names (see: https://docs.openforcefield.org/projects/toolkit/en/stable/users/molecule_conversion.html#hierarchy-data-chains-and-residues).
In the RDKit tk (and how things are converted from OFF Tk to RDKit), this is handled via PDBResidueInfo, which is set as an atom-wise property.
When this information is set, it is possible to create an OFFMol & by consequence OpenMM Topology that retains this PDB information.
GUFE's serialization does not preserve this information
Because PDBResidueInfo isn't a property, this information isn't preserved when we call
to_dict
andfrom_dict
, so it gets lost before we can use it within Protocols.Proposal
We add the storage & retrieval of some PDBResidueInfo data when calling
to_dict
andfrom_dict
.Specifically, my proposal would be to include; residue name, residue ID, and chain ID, as per: https://github.com/openforcefield/openff-toolkit/blob/dcd78c97a161b522665d7a4c74ea1eefc5ec2ceb/openff/toolkit/utils/rdkit_wrapper.py#L862-L865