Closed nithin127 closed 6 years ago
Sidenote: you should always post complete error messages when reporting a bug.
Try running with --processes 1
.
The complete error message:
Traceback (most recent call last): File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/apple/rl/pytorch-a2c-ppo/scripts/train.py", line 135, in <module> args.optim_eps, args.clip_eps, args.epochs, args.batch_size, preprocess_obss) File "/Users/apple/rl/pytorch-a2c-ppo/torch_rl/torch_rl/algos/ppo.py", line 18, in __init__ value_loss_coef, max_grad_norm, recurrence, preprocess_obss, reshape_reward) File "/Users/apple/rl/pytorch-a2c-ppo/torch_rl/torch_rl/algos/base.py", line 80, in __init__ self.obs = self.env.reset() File "/Users/apple/rl/pytorch-a2c-ppo/torch_rl/torch_rl/utils/penv.py", line 40, in reset results = [self.envs[0].reset()] + [local.recv() for local in self.locals] File "/Users/apple/rl/pytorch-a2c-ppo/torch_rl/torch_rl/utils/penv.py", line 40, in <listcomp> results = [self.envs[0].reset()] + [local.recv() for local in self.locals] File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/Users/apple/anaconda3/envs/rl_env/lib/python3.6/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError
These errors only occur for --procs > 1
; as the torch_rl/torch_rl/utils/penv.py
is implemented as follows:
See ParallelEnv
class; line (40), [local.recv() for local in self.locals]
is []
for --procs 1
So I think we had the same error with the pytorch_rl code. You can either always run with 1 process. Or you can try to replicate this fix in the pytorch-a2c-ppo code: https://github.com/duckietown/gym-duckietown/issues/38
You have to change the "fork method" to "forkserver" as is done in this commit: https://github.com/duckietown/gym-duckietown/pull/43/files
Not sure if this is a problem with the environment. But the same problems are not occurring while using gym-minigrid. Any help with solving the problem would be awesome!
How to reproduce the issue:
git clone git@github.com:nithin127/pytorch-a2c-ppo.git
(Make sure to install dependenciescd pytorch-a2c-ppo
andpip3 install -e torch_rl
)git checkout lcs_base
Now,
python3 -m scripts.train --algo ppo --env MiniGrid-DoorKey-5x5-v0 --no-mem --save-interval 1
works properly, whilepython3 -m scripts.train --algo ppo --env Duckietown-small_loop-v0 --no-mem --save-interval 10
gives the EOF error due to parallel processing. Maybe this is due to duckietown graphics specifically?