EOFError with multiprocessing

yding5 commented 5 years ago

Thanks for your great work. I find this environment very useful for our research. When I tried the training example, I got EOFError whenever the num-processes > 1. The error happens when the training is finished and seems to related to the multiprocessing. The same error seems also happens in a Macbook.

I find some similar problem through search as below, but none of them solve the problem. Could you please take a look at that? Thanks.

https://github.com/duckietown/gym-duckietown/issues/75 https://github.com/openai/baselines/issues/640

Linux

Environment info: Ubuntu 18.04.2 LTS python 3.7.3 gym 0.13.0 pyglet 1.2.4

Error message: (tensorflow_yukun) akash@a1:~/Documents/yukun/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr$ python main.py --algo ppo --num-frames 4000 --num-processes 2 --num-steps 80 --lr 0.00005 --env-name MiniWorld-Hallway-v0 Falling back to num_samples=1 Falling back to num_samples=1 Falling back to num_samples=1 Falling back to num_samples=1 Creating frame stacking wrapper Saving model

Updates 10, num timesteps 1760, FPS 61 Last 7 training episodes: mean/median reward 0.14/0.00, min/max reward 0.00/0.96, success rate 0.14

Updates 20, num timesteps 3360, FPS 61 Last 14 training episodes: mean/median reward 0.14/0.00, min/max reward 0.00/0.98, success rate 0.14

Process Process-2: Process Process-1: Traceback (most recent call last): Traceback (most recent call last): File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, *self._kwargs) File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(self._args, **self._kwargs) File "/home/akash/Documents/yukun/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker cmd, data = remote.recv() File "/home/akash/Documents/yukun/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker cmd, data = remote.recv() File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 383, in _recv raise EOFError File "/home/akash/.conda/envs/tensorflow_yukun/lib/python3.7/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError EOFError

Mac

macOS 10.14.5 (18F132) python 3.6.8 gym 0.13.1 pyglet 1.4.1

(miniWorld1) yukuns-mbp:pytorch-a2c-ppo-acktr yukun.ding1@ibm.com$ python main.py --algo ppo --num-frames 2000 --num-processes 2 --num-steps 80 --lr 0.00005 --env-name MiniWorld-Hallway-v0 Falling back to non-multisampled frame buffer Falling back to num_samples=8 Falling back to non-multisampled frame buffer Falling back to non-multisampled frame buffer Falling back to num_samples=8 Falling back to non-multisampled frame buffer Creating frame stacking wrapper Saving model

Updates 0, num timesteps 160, FPS 53 Last 4 training episodes: mean/median reward 0.98/0.98, min/max reward 0.97/1.00, success rate 1.00

Updates 10, num timesteps 1760, FPS 56 Last 12 training episodes: mean/median reward 0.64/0.94, min/max reward 0.00/1.00, success rate 0.67

Process Process-2: Process Process-1: Traceback (most recent call last): File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, *self._kwargs) File "/Users/yukun.ding1@ibm.com/Documents/Code/yding5/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker cmd, data = remote.recv() File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 383, in _recv raise EOFError Traceback (most recent call last): EOFError File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(self._args, **self._kwargs) File "/Users/yukun.ding1@ibm.com/Documents/Code/yding5/miniWorld/gym-miniworld/pytorch-a2c-ppo-acktr/vec_env/subproc_vec_env.py", line 9, in worker cmd, data = remote.recv() File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 250, in recv buf = self._recv_bytes() File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/anaconda3/envs/miniWorld1/lib/python3.6/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError

maximecb commented 5 years ago

Hello there!

I added a call to envs.close() in main.py. Please let me know if that makes the error message disappear.

Also, I'm curious to know what kind of research you're doing, and if there are environments/features you would like to see added to MiniWorld :)

yding5 commented 5 years ago

Hello, the envs.close() solves the problem nicely. Thanks!

We are currently exploring some agent-based tasks with reinforcement learning and the world model. I found this environment very handy and easy to be modified. One thing might make it even better is more objects available. Maybe some info on how to import more mesh objects. We are still in early-stage and I will for sure update here when we find something important or get a paper written.

BTW, I have a minor question here. I got "Falling back to num_samples=1" consistently. This seems to be something related to the rendering. Does this causing any problem and can I just ignore it?

maximecb commented 5 years ago

Maybe some info on how to import more mesh objects.

Are you looking for info on where to find meshes, or just how to load them into the world? I should indeed document that. Let me know if there's anything else I can help with :)

BTW, I have a minor question here. I got "Falling back to num_samples=1" consistently. This seems to be something related to the rendering. Does this causing any problem and can I just ignore it?

You can mostly just ignore it. It's trying to do anti-aliasing, but that only seems to work if you're running directly on a machine with an nvidia GPU and not using xvfb.

yding5 commented 5 years ago

For the mesh objects, I mean some info about whether we can import new meshes and how to do it. It might be something like "you can put the mesh file of what format into the mesh folder, and modify which line in the code to import it". Thanks for your help!

maximecb commented 5 years ago

There it is: https://github.com/maximecb/gym-miniworld/blob/master/docs/design.md#loading-3d-models

yding5 commented 5 years ago

Great, thanks!

Farama-Foundation / Miniworld

EOFError with multiprocessing #16

Linux

Mac