ericsnowcurrently / multi-core-python

Enabling CPython multi-core parallelism via subinterpreters.
BSD 3-Clause "New" or "Revised" License
245 stars 6 forks source link

Can't unpickle objects defined in __main__ #46

Open crusaderky opened 5 years ago

crusaderky commented 5 years ago

(https://bugs.python.org/issue37292)

ORIGINAL POST:

As of CPython 3.8.0b1, main branch (please let me know if there's a different branch I should use):

If one pickles an object that is defined in the __main__ module, sends it to a subinterpreter as bytes, and then tries unpickling it there, it fails saying that __main__ doesn't define it.

import _xxsubinterpreters as interpreters
import pickle

class C:
    pass

c = C()

interp_id = interpreters.create()
c_bytes = pickle.dumps(c)
interpreters.run_string(
    interp_id,
    "import pickle; pickle.loads(c_bytes)",
    shared={"c_bytes": c_bytes},
)

If the above is executed directly with the python command-line, it fails. If it's imported from another module, it works.

I'm unsure if that's working as intended or not; I was expecting behaviour compatible with sub-processes spawned with the spawn method, where the__main__ of the parent process is visilble to the subprocess too.

Workarounds: 1 - define everything that must be pickled in an imported module 2 - use cloudPickle, which implements a hardcoded special case that makes it pickle the whole code of any object defined in __main__.

Possible future solutions:

ericsnowcurrently commented 5 years ago

This definitely sounds like a bug. :( Thanks for finding that! Please open a new issue on bugs.python.org and feel free to nosy me.

ericsnowcurrently commented 5 years ago

https://bugs.python.org/issue37292

@crusaderky, Thanks!

ericsnowcurrently commented 5 years ago

I'm going to track this here along with other subinterpreter-related bugs that need short-term attention.

LewisGaul commented 4 years ago

I've looked into this issue and reproduced the reported behaviour, but it's not clear to me why this should be expected to work?

I was expecting behaviour compatible with sub-processes spawned with the spawn method, where themain of the parent process is visilble to the subprocess too.

@crusaderky Could you help me out by providing a code snippet that reproduces the behaviour you're referring to here?

crusaderky commented 4 years ago

@LewisGaul

from concurrent.futures import ProcessPoolExecutor
from multiprocessing import get_context

def f():
    print("Hello world")

if __name__ == "__main__":
    with ProcessPoolExecutor(mp_context=get_context("spawn")) as ex:
        ex.submit(f).result()

f is being pickled in the main process and unpickled in the slave process.

crusaderky commented 4 years ago

A bit clearer:


import os
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import get_context

class C:
    def __getstate__(self):
        print("pickled in %d" % os.getpid())
        return {}

    def __setstate__(self, state):
        print("unpickled in %d" % os.getpid())

    def hello(self):
        print("Hello world")

if __name__ == "__main__":
    with ProcessPoolExecutor(mp_context=get_context("spawn")) as ex:
        ex.submit(C().hello).result()

Output:

pickled in 23480
unpickled in 23485
Hello world
LewisGaul commented 4 years ago

Ah yes ok, thanks for the example. I'll take a look at how subprocess achieves this and see if I can work out what needs doing for subinterpreters.

ericsnowcurrently commented 4 years ago

Thanks for looking into this, @LewisGaul (and @crusaderky). Please continue the discussion, but do it over on BPO.

FYI, this repo is intended mostly for coordinating effort and breaking down. We want to stick to the normal core development workflow as much as possible. Plus you're likely to get better involvement from the community that way (even if no one has chimed in on the issue there yet). :)

LewisGaul commented 4 years ago

Noted, I'll shift the discussion to there :)