eloialonso / iris

Transformers are Sample-Efficient World Models. ICLR 2023, notable top 5%.
https://openreview.net/forum?id=vhFu1Acb0xb
GNU General Public License v3.0
805 stars 80 forks source link

ForkingPickler: Can't pickle local object #4

Closed hexaflexa closed 2 years ago

hexaflexa commented 2 years ago

I am trying to execute the example training run python src/main.py env.train.id=BreakoutNoFrameskip-v4 common.device=cuda:0 wandb.mode=online, but I am getting a Can't pickle local object error. A partial backtrace is shown here:

Exception has occurred: AttributeError
  (note: full exception trace is shown but execution is paused at: _run_module_as_main)

Can't pickle local object 'Trainer.__init__.<locals>.create_env.<locals>.<lambda>'
  File "[PATH]\Lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "[PATH]\Lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "[PATH]\Lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "[PATH]\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "[PATH]\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "src\envs\multi_process_env.py", line 61, in __init__
    p.start()
  File "src\trainer.py", line 66, in create_env
    return MultiProcessEnv(env_fn, num_envs, should_wait_num_envs_ratio=1.0) if num_envs > 1 else SingleProcessEnv(env_fn)
  File "src\trainer.py", line 74, in __init__
    test_env = create_env(cfg.env.test, cfg.collection.test.num_envs)
  File "src\main.py", line 9, in main
    trainer = Trainer(cfg)

Any recommendation on workarounds?

vmicheli commented 2 years ago

Hey, a simple workaround is to use a SingleProcessEnv instead of a MultiProcessEnv for the evaluation. You can do that by appending collection.test.num_envs=1 to the training command.

Could you give me more details about your OS and Python version so that we can look into what is causing the issue?

hexaflexa commented 2 years ago

Yes. last night tried test.num_envs=1 to avoid the error

I also figured out a workaround in class Trainer (see below) to allow test.num_envs=8, but I am not experienced enough in python to know if partials can be pickled safely and portably.

from functools import partial

def create_env(cfg_env, num_envs):
    # multiprocessing.Process will call ForkingPicker.dump, which results in an error:
    #
    # Can't pickle local object 'Trainer.__init__.<locals>.create_env.<locals>.<lambda>'
    #   File "[PATH]\Lib\multiprocessing\reduction.py", line 60, in dump
    #     ForkingPickler(file, protocol).dump(obj)
    #
    # ForkingPicker inherits from pickle.Pickler, and I read that the standard pickle 
    # module cannot pickle lambda functions (? not 100% sure though)
    #
    # Instead of lambda, we can use partial as a workaround

    # env_fn = lambda: instantiate(cfg_env)  # can't use lambda here
    env_fn = partial(instantiate, cfg_env)
    return MultiProcessEnv(env_fn, num_envs, should_wait_num_envs_ratio=1.0) if num_envs > 1 else SingleProcessEnv(env_fn)

I'm using python 3.10.4 and Windows 10

I also read some comments (e.g. https://stackoverflow.com/questions/71070394/serializing-lambdas-and-functions-with-dill-is-there-a-better-faster-way) about being able to pickle lambdas using dill, but I didn't try that

eloialonso commented 2 years ago

Hey, the issue seems to be related to Windows. We included your suggestion in commit 03290c820e623d8b184ce53a082cccd05d3f08f4, thanks!