EOF error while Parallel processing

nithin127 commented 6 years ago

Not sure if this is a problem with the environment. But the same problems are not occurring while using gym-minigrid. Any help with solving the problem would be awesome!

How to reproduce the issue:

git clone git@github.com:nithin127/pytorch-a2c-ppo.git (Make sure to install dependencies cd pytorch-a2c-ppo and pip3 install -e torch_rl) git checkout lcs_base

Now, python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --no-mem --save-interval 1 works properly, while

python3 -m scripts.train --algo ppo --env Duckietown-small_loop-v0 --no-mem --save-interval 10 gives the EOF error due to parallel processing. Maybe this is due to duckietown graphics specifically?

maximecb commented 6 years ago

Sidenote: you should always post complete error messages when reporting a bug.

Try running with --processes 1.

nithin127 commented 6 years ago

The complete error message:

Traceback (most recent call last): File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/apple/rl/pytorch-a2c-ppo/scripts/train.py", line 135, in <module> args.optim_eps, args.clip_eps, args.epochs, args.batch_size, preprocess_obss) File "/Users/apple/rl/pytorch-a2c-ppo/torch_rl/torch_rl/algos/ppo.py", line 18, in __init__ value_loss_coef, max_grad_norm, recurrence, preprocess_obss, reshape_reward) File "/Users/apple/rl/pytorch-a2c-ppo/torch_rl/torch_rl/algos/base.py", line 80, in __init__ self.obs = self.env.reset() File "/Users/apple/rl/pytorch-a2c-ppo/torch_rl/torch_rl/utils/penv.py", line 40, in reset results = [self.envs[0].reset()] + [local.recv() for local in self.locals] File "/Users/apple/rl/pytorch-a2c-ppo/torch_rl/torch_rl/utils/penv.py", line 40, in <listcomp> results = [self.envs[0].reset()] + [local.recv() for local in self.locals] File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError

These errors only occur for --procs > 1; as the torch_rl/torch_rl/utils/penv.py is implemented as follows: See ParallelEnv class; line (40), [local.recv() for local in self.locals] is [] for --procs 1

maximecb commented 6 years ago

So I think we had the same error with the pytorch_rl code. You can either always run with 1 process. Or you can try to replicate this fix in the pytorch-a2c-ppo code: https://github.com/duckietown/gym-duckietown/issues/38

You have to change the "fork method" to "forkserver" as is done in this commit: https://github.com/duckietown/gym-duckietown/pull/43/files

duckietown / gym-duckietown

EOF error while Parallel processing #75