Closed hexaflexa closed 2 years ago
Hey, a simple workaround is to use a SingleProcessEnv
instead of a MultiProcessEnv
for the evaluation. You can do that by appending collection.test.num_envs=1
to the training command.
Could you give me more details about your OS and Python version so that we can look into what is causing the issue?
Yes. last night tried test.num_envs=1
to avoid the error
I also figured out a workaround in class Trainer
(see below) to allow test.num_envs=8
, but I am not experienced enough in python to know if partials
can be pickled safely and portably.
from functools import partial
def create_env(cfg_env, num_envs):
# multiprocessing.Process will call ForkingPicker.dump, which results in an error:
#
# Can't pickle local object 'Trainer.__init__.<locals>.create_env.<locals>.<lambda>'
# File "[PATH]\Lib\multiprocessing\reduction.py", line 60, in dump
# ForkingPickler(file, protocol).dump(obj)
#
# ForkingPicker inherits from pickle.Pickler, and I read that the standard pickle
# module cannot pickle lambda functions (? not 100% sure though)
#
# Instead of lambda, we can use partial as a workaround
# env_fn = lambda: instantiate(cfg_env) # can't use lambda here
env_fn = partial(instantiate, cfg_env)
return MultiProcessEnv(env_fn, num_envs, should_wait_num_envs_ratio=1.0) if num_envs > 1 else SingleProcessEnv(env_fn)
I'm using python 3.10.4 and Windows 10
I also read some comments (e.g. https://stackoverflow.com/questions/71070394/serializing-lambdas-and-functions-with-dill-is-there-a-better-faster-way) about being able to pickle lambdas using dill
, but I didn't try that
Hey, the issue seems to be related to Windows. We included your suggestion in commit 03290c820e623d8b184ce53a082cccd05d3f08f4, thanks!
I am trying to execute the example training run
python src/main.py env.train.id=BreakoutNoFrameskip-v4 common.device=cuda:0 wandb.mode=online
, but I am getting aCan't pickle local object
error. A partial backtrace is shown here:Any recommendation on workarounds?