MolecularAI / reaction_utils

Utilities for working with datasets of chemical reactions, reaction templates and template extraction.
https://molecularai.github.io/reaction_utils/
Apache License 2.0
63 stars 11 forks source link

Pickle error with timeout_decorator package #1

Closed marcosfelt closed 2 years ago

marcosfelt commented 2 years ago

I tried the following from the templates tutorial using python 3.8 (Mac OS Monterrey) and got a pickle error. Any ideas why?

from rxnutils.chem.reaction import ChemicalReaction

reaction = "CCN(CC)CC.CCOCC.Cl[S:3]([CH2:2][CH3:1])(=[O:4])=[O:5].[OH:6][CH2:7][CH2:8][Br:9]>>[CH3:1][CH2:2][S:3](=[O:4])(=[O:5])[O:6][CH2:7][CH2:8][Br:9]"
rxn = ChemicalReaction(reaction)
rxn.generate_reaction_template(radius=1)
Traceback (most recent call last):
  File "/Users/Kobi/Documents/Research/phd_code/other/reaction_utils/rxnutils/chem/reaction.py", line 212, in generate_reaction_template
    canonical_smarts = self._generate_rdchiral_template(
  File "/Users/Kobi/Documents/Research/phd_code/other/reaction_utils/.venv/lib/python3.8/site-packages/timeout_decorator/timeout_decorator.py", line 92, in new_function
    return timeout_wrapper(*args, **kwargs)
  File "/Users/Kobi/Documents/Research/phd_code/other/reaction_utils/.venv/lib/python3.8/site-packages/timeout_decorator/timeout_decorator.py", line 147, in __call__
    self.__process.start()
  File "/opt/homebrew/Cellar/python@3.8/3.8.13/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/homebrew/Cellar/python@3.8/3.8.13/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/homebrew/Cellar/python@3.8/3.8.13/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/opt/homebrew/Cellar/python@3.8/3.8.13/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/homebrew/Cellar/python@3.8/3.8.13/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/homebrew/Cellar/python@3.8/3.8.13/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/opt/homebrew/Cellar/python@3.8/3.8.13/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function ChemicalReaction._generate_rdchiral_template at 0x13a331c10>: it's not the same object as rxnutils.chem.reaction.ChemicalReaction._generate_rdchiral_template

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Kobi/Documents/Research/phd_code/other/reaction_utils/rxnutils/chem/reaction.py", line 235, in generate_reaction_template
    raise ReactionException(f"Template generation failed with message: {err}")
rxnutils.chem.reaction.ReactionException: Template generation failed with message: Can't pickle <function ChemicalReaction._generate_rdchiral_template at 0x13a331c10>: it's not the same object as rxnutils.chem.reaction.ChemicalReaction._generate_rdchiral_template
CKannas commented 2 years ago

Hi,

Could you provide more information about the Python environment you use? In addition, what environment do you use to run the tutorial, i.e. notebook, ipython/python terminal?

marcosfelt commented 2 years ago

I used pip install . inside a python virtual environment ( created with python3 -m venv .venv). I had to install rdkit manually using pip install rdkit_pypi. I was running from the python repl to just to test the tutorial.

CKannas commented 2 years ago

I've created a similar environment on my Linux workstation.

  1. Using Python 3.10 instead of 3.8, created a new virtualenv with python -m venv test_venv
  2. Activated virtualenv
  3. Updated pip with python -m pip install --upgrade pip
  4. Installed rdkit, I tried both python -m pip install rdkit and python -m pip install rdkit-pypi (I didn't notice any difference)
  5. Installed rxnutils with python -m pip install .
  6. Opened an ipython terminal and run the same tutorial code with no issues
  7. Installed poetry with python -m pip install poetry
  8. Installed rxnutils dependencies with poetry install (this downgraded a lot of packages and installed a few)
  9. Opened an ipython terminal and run the same tutorial code with no issues

Looking at the error message again, after calling timeout_decorator it jumps out of your virtualenv and calls multiprocessing from homebrew, not sure if this is intended or not.

I would suggest to recreate your virtualenv as follows and test again.

python -m venv --clear <venv_name>  # Python >= 3.8 works
. <venv_name>/bin/activate
python -m pip install --upgrade pip
python -m pip install rdkit or python -m pip install rdkit-pypi
python -m pip install poetry
# Navigate to the root of the repository and run
poetry install

Or you can just install rxnutils directly from GitHub as follows:

python -m venv --clear <venv_name>  # Python >= 3.8 works
. <venv_name>/bin/activate
python -m pip install --upgrade pip
python -m pip install rdkit or python -m pip install rdkit-pypi
python -m pip install git+https://github.com/MolecularAI/reaction_utils

You should then have rxnutils installed in the virtualenv named <venv_name>. You can use python, ipython and jupyter notebook from the virtualenv.

marcosfelt commented 2 years ago

Thanks for checking this in detail! I just did some digging and it looks like this is a problem with the timeout_decorator library. You can see the open issue here; I'm guessing it fails on some platforms and works fine on others. I'd recommend maybe getting rid of the dependency on timeout-decorator as it seems it's not updated regularly; there's also this fork that seems more up to date: https://github.com/bitranox/wrapt_timeout_decorator

marcosfelt commented 2 years ago

Also it's not best practice, to require people to use poetry to install the package (and trust me I love poetry!). Would it be possible to upload to pypi? Poetry has a nice command for doing this: https://python-poetry.org/docs/libraries/#publishing-to-pypi

CKannas commented 2 years ago

Thanks for checking this in detail! I just did some digging and it looks like this is a problem with the timeout_decorator library. You can see the open issue here; I'm guessing it fails on some platforms and works fine on others. I'd recommend maybe getting rid of the dependency on timeout-decorator as it seems it's not updated regularly; there's also this fork that seems more up to date: https://github.com/bitranox/wrapt_timeout_decorator

Thanks, we'll have to look into it further. It seems wrapt_timeout_decorator is a better alternative.

SGenheden commented 2 years ago

Also it's not best practice, to require people to use poetry to install the package (and trust me I love poetry!). Would it be possible to upload to pypi? Poetry has a nice command for doing this: https://python-poetry.org/docs/libraries/#publishing-to-pypi

Can you open another issue for this?

CKannas commented 2 years ago

I looked into how timeout_decorator and wrapt_timeout_decorator work.

timeout_decorator, relies only on build-in libraries, multiprocessing and signal for the two different ways of managing a wrapped function (using functools.wraps). The approach using signal seems to only work on Linux like systems. The approach using multiprocessing is a more general one, which relies on the build-in multiprocessing and pickle libraries.

~On the other hand wrapt_timeout_decorator uses 3rd party library wrapt for wrapping functions, signal and multiprocessing for managing wrapped functions and dill to track issues with unpickable objects. They don't replace pickle with dill, only pathos is doing that.~

Update: I was wrong. wrapt_timeout_decorator uses:

We use timeout_decorator with multiprocessing, which on Linux uses fork to run the wrapped function while for MacOS/Windows uses spawn by default. With forking the child process inherits state & variables from parent while with spawning the child process starts in a new Python interpreter and module is reimported and variables are recreated.

Which were the issue with pickling lies, as when spawning the import path is different than the path the function has before spawning. (https://peaku.co/questions/3030-bifurcacion-de-multiprocesamiento-()-vs-spawn-() & https://britishgeologicalsurvey.github.io/science/python-forking-vs-spawn/#:~:text=Forking%20and%20spawning%20are%20two,they%20were%20in%20the%20parent.)

We'll look for a solution.

SGenheden commented 2 years ago

We will switch to the wrapt_timeout_decorator package. An update will come soon.

SGenheden commented 2 years ago

Fixed in merged PR. Will be included in a feature release