OpenFreeEnergy / gufe

grand unified free energy by OpenFE
https://gufe.readthedocs.io
MIT License
28 stars 8 forks source link

RDKit `Mol` doesn't preserve atom properties when pickled; impacts `ExplicitMoleculeComponent` and subclasses #322

Open dotsdl opened 4 months ago

dotsdl commented 4 months ago

Although we typically serialize GufeTokenizables to dict, keyed_dict, or KeyedChain forms turned to JSON, it can also be convenient to pickle these objects in certain cases. When using multiprocessing, for example, GufeTokenizables passed as arguments to functions executed on other processes are typically pickled/unpickled as the serialization approach.

This presents an issue for ExplicitMoleculeComponents: RDKit Mol objects do not by default include atom properties when pickled (https://github.com/rdkit/rdkit/issues/6573#issuecomment-1781734093). This behavior causes issues especially for preserving partial charges, since we use RDKit Mol properties for holding on to these.

It's possible to change this behavior globally for RDKit with:

from rdkit import Chem
Chem.SetDefaultPickleProperties(Chem.PropertyPickleOptions.AllProps)

Should we set this within gufe so as to avoid this issue across the board, or will this have undesirable consequences?

IAlibay commented 4 months ago

Thanks for opening this @dotsdl ! That is indeed quite interesting, I've put it on the tracker for review.

dotsdl commented 4 months ago

I'll draft a PR for this, including a test illustrating the behavior we are trying to solution for.