Farama-Foundation / SuperSuit

A collection of wrappers for Gymnasium and PettingZoo environments (being merged into gymnasium.wrappers and pettingzoo.wrappers
Other
451 stars 57 forks source link

Multiprocessing in SuperSuit #75

Closed p-veloso closed 3 years ago

p-veloso commented 3 years ago

Based on the template for multiprocessing in SB3 I decided to check if I could use multiprocessing in SuperSuit. Here are my files: petting bubble rl.zip

from stable_baselines3.ppo import MlpPolicy
from stable_baselines3 import PPO
import supersuit as ss
from petting_bubble_env_continuous import PettingBubblesEnvironment
import numpy as np
import time
import os

print("{} cpus available".format(os.cpu_count()))
args = [3, 3, 5, 20]
n_timesteps = int(8e3)

#single process
env_single = PettingBubblesEnvironment(*args)
env_single = ss.black_death_v1(env_single)
env_single = ss.pettingzoo_env_to_vec_env_v0(env_single)
env_single = ss.concat_vec_envs_v0(env_single, 8, num_cpus=1, base_class='stable_baselines3')
model = PPO(MlpPolicy, env_single, verbose=0, gamma=0.995, ent_coef=0.01, learning_rate=2.5e-5, vf_coef=0.5,
            max_grad_norm=0.5, gae_lambda=0.95, n_epochs=4, clip_range=0.2, clip_range_vf=1)

start_time = time.time()
model.learn(total_timesteps=n_timesteps)
total_time_single = time.time()-start_time
print(f"Took {total_time_single:.2f}s for single process version - {n_timesteps / total_time_single:.2f} FPS")

#multiprocessing
env_multi = PettingBubblesEnvironment(*args)
env_multi = ss.black_death_v1(env_multi)
env_multi = ss.pettingzoo_env_to_vec_env_v0(env_multi)
env_multi = ss.concat_vec_envs_v0(env_multi, 8, num_cpus=8, base_class='stable_baselines3')
model = PPO(MlpPolicy, env_multi, verbose=0, gamma=0.995, ent_coef=0.01, learning_rate=2.5e-5, vf_coef=0.5,
            max_grad_norm=0.5, gae_lambda=0.95, n_epochs=4, clip_range=0.2, clip_range_vf=1)

start_time = time.time()
model.learn(total_timesteps=n_timesteps)
total_time_multi = time.time()-start_time
print(f"Took {total_time_multi:.2f}s for multiprocessed version - {n_timesteps / total_time_multi:.2f} FPS")

However, the version with multiprocessing fails...

pygame 2.0.1 (SDL 2.0.14, Python 3.8.8)
Hello from the pygame community. https://www.pygame.org/contribute.html
16 cpus available
Took 171.35s for single process version - 46.69 FPS
Traceback (most recent call last):
  File "C:/Users/pedro/OneDrive/Documentos/2021 Learning Matters/petting bubble rl/petting_bubble_multi_test.py", line 30, in <module>
    env_multi = ss.concat_vec_envs_v0(env_multi, 8, num_cpus=8, base_class='stable_baselines3')
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector_constructors.py", line 37, in concat_vec_envs
    vec_env = MakeCPUAsyncConstructor(num_cpus)(*vec_env_args(vec_env, num_vec_envs))
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector\constructors.py", line 38, in constructor
    return ProcConcatVec(cat_env_fns, obs_space, act_space, num_fns * envs_per_env, example_env.metadata)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector\multiproc_vec.py", line 83, in __init__
    proc.start()
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'vec_env_args.<locals>.env_fn'
Exception ignored in: <function ProcConcatVec.__del__ at 0x000001C30FEBF700>
Traceback (most recent call last):
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector\multiproc_vec.py", line 147, in __del__
    for pipe in self.pipes:
AttributeError: 'ProcConcatVec' object has no attribute 'pipes'

Process finished with exit code 1
benblack769 commented 3 years ago

I can't seem to reproduce this. For me the following code works fine.

from stable_baselines3.ppo import MlpPolicy
from stable_baselines3 import PPO
# from stable_baselines3.common.vec_env import VecMonitor
import supersuit as ss
# from petting_bubble_env_continuous import PettingBubblesEnvironment
from pettingzoo.mpe import simple_push_v2
import gym

env = simple_push_v2.parallel_env()
env = ss.pad_observations_v0(env)
env = ss.black_death_v1(env)
env = ss.pettingzoo_env_to_vec_env_v0(env)
env = ss.concat_vec_envs_v0(env, 4, num_cpus=4, base_class='stable_baselines3')

model = PPO(MlpPolicy, env, verbose=2, gamma=0.999, n_steps=1000, ent_coef=0.01, learning_rate=0.00025, vf_coef=0.5, max_grad_norm=0.5, gae_lambda=0.95, n_epochs=4, clip_range=0.2, clip_range_vf=1, tensorboard_log="./ppo_test/")
model.learn(total_timesteps=1000000, tb_log_name="test",  reset_num_timesteps=True)
model.save("bubble_policy_test")

Looking at your stack trace, perhaps the problem is that you are using windows? Windows only supports spawn multiprocessing which requires data to be pickled. We don't officially support windows. Windows causes an unbearable number of problems for a maintainer, and none of us use windows ourselves, so it is also hard to test solutions. We strongly recommend you find a linux or macos platform for working on these projects. Linux subsystem for windows works well for me.

If you want to make a PR to fix this yourself though, feel free. The offending local function which is killing the pickling process appears to be this one: https://github.com/PettingZoo-Team/SuperSuit/blob/master/supersuit/vector_constructors.py#L8. I suppose there might be a solution where instead of a local function it can be a class which takes in the env in its init and has an override to __call__ like this one: https://github.com/PettingZoo-Team/SuperSuit/blob/master/supersuit/vector/constructors.py#L5

p-veloso commented 3 years ago

Yes, I am using windows... Also, I am not very familiar with multiprocessing implementations, but I gave a try.... I substituted the original vec_env_args by

class EnvFn:
    def __init__(self, env):
        self.env = env

    def __call__(self):
        return cloudpickle.loads(cloudpickle.dumps(self.env))

def vec_env_args(env, num_envs):
    env_fn = EnvFn(env)
    return [env_fn] * num_envs, env.observation_space, env.action_space

It resulted in a different error (starting a new process before the ongoing process has finished its bootstrapping). As I mentioned before, this is not my expertise at all. Thanks for the hints.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\pedro\OneDrive\Documentos\2021 Learning Matters\petting bubble rl\petting_bubble_multi_test.py", line 17, in <module>
    env_multi = ss.concat_vec_envs_v0(env_multi, 8, num_cpus=8, base_class='stable_baselines3')
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector_constructors.py", line 48, in concat_vec_envs
    vec_env = MakeCPUAsyncConstructor(num_cpus)(*vec_env_args(vec_env, num_vec_envs))
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector\constructors.py", line 38, in constructor
    return ProcConcatVec(cat_env_fns, obs_space, act_space, num_fns * envs_per_env, example_env.metadata)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector\multiproc_vec.py", line 83, in __init__
    proc.start()
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
16 cpus available
Exception ignored in: <function ProcConcatVec.__del__ at 0x000001648C105550>
Traceback (most recent call last):
  File "C:\Users\pedro\Anaconda3\envs\rl_exercises\lib\site-packages\supersuit\vector\multiproc_vec.py", line 147, in __del__
    for pipe in self.pipes:
AttributeError: 'ProcConcatVec' object has no attribute 'pipes'
p-veloso commented 3 years ago

@weepingwillowben

I understand that there is no official support for windows. However, when I run on a p2.xlarge AWS instance, it also raises errors and exceptions.

4 cpus available
Using cuda device
-------------------------------
| time/              |        |
|    fps             | 7570   |
|    iterations      | 1      |
|    time_elapsed    | 42     |
|    total_timesteps | 320000 |
-------------------------------
Segmentation fault (core dumped)
Process Process-4:
(pytorch_latest_p37) ubuntu@ip-172-31-15-100:~/bubbles$ Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 60, in async_loop
    pipe.send((e, tb))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process Process-3:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 60, in async_loop
    pipe.send((e, tb))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 60, in async_loop
    pipe.send((e, tb))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process Process-1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 60, in async_loop
    pipe.send((e, tb))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe 
benblack769 commented 3 years ago

So for the first error, you will have to familiarize yourself with multiprocessing on windows a bit.

As for the second error, the Broken pipes and EOF errors are likely caused by a segfault in the original process. Last time I ran into this problem it was because I was rendering to a screen. There could also be issues if your process is interacting with another process through a network connection or something.

I would make sure that it works for an official pettingzoo environment like MPE on your system first.

p-veloso commented 3 years ago

I ran a test with simple_adversary_v2 and the problem happened again (now in a g3.4xlarge). Here is the file: mpe_test.zip

16 cpus available
Using cuda device
------------------------------
| time/              |       |
|    fps             | 5047  |
|    iterations      | 1     |
|    time_elapsed    | 2     |
|    total_timesteps | 12288 |
------------------------------
Segmentation fault (core dumped)
Process Process-4:
(pytorch_latest_p37) ubuntu@ip-172-31-8-173:~/bubbles$ Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 60, in async_loop
    pipe.send((e, tb))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process Process-3:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 60, in async_loop
    pipe.send((e, tb))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process Process-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 60, in async_loop
    pipe.send((e, tb))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process Process-1:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/supersuit/vector/multiproc_vec.py", line 60, in async_loop
    pipe.send((e, tb))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
benblack769 commented 3 years ago

Thanks for providing the simple example.

Unfortunately, I still cannot reproduce this. On my system, your mpe_test.py code works just fine.

How are you running the code? What versions of python and the libraries are you running?

jkterry1 commented 3 years ago

Also, are you only having this MPE issue on windows systems?

benblack769 commented 3 years ago

@justinkterry You can see that it is ubuntu from the stack trace.

jkterry1 commented 3 years ago

Right, my apologies

p-veloso commented 3 years ago

I was using an environment with pytorch from the Deep Learning AMI on AWS. I will try to run again tomorrow to get the specifications.

p-veloso commented 3 years ago

The AMI uses Python 3.7.10

I installed: pip install git+https://github.com/vwxyzjn/stable-baselines3 pip install supersuit

These are the packages listed:

Package                            Version
---------------------------------- -------------------
alabaster                          0.7.12
anaconda-client                    1.7.2
anaconda-project                   0.9.1
anyio                              2.2.0
appdirs                            1.4.4
argh                               0.26.2
argon2-cffi                        20.1.0
asn1crypto                         1.4.0
astroid                            2.5
astropy                            4.2
async-generator                    1.10
atomicwrites                       1.4.0
attrs                              20.3.0
autopep8                           1.5.5
autovizwidget                      0.18.0
Babel                              2.9.0
backcall                           0.2.0
backports.shutil-get-terminal-size 1.0.0
beautifulsoup4                     4.9.3
bitarray                           1.6.3
bkcharts                           0.2
black                              19.10b0
bleach                             3.3.0
blis                               0.7.4
bokeh                              2.2.3
boto                               2.49.0
boto3                              1.17.12
botocore                           1.20.12
Bottleneck                         1.3.2
brotlipy                           0.7.0
catalogue                          2.0.1
certifi                            2020.12.5
cffi                               1.14.5
chardet                            4.0.0
click                              7.1.2
cloudpickle                        1.6.0
clyent                             1.2.2
colorama                           0.4.4
contextlib2                        0.6.0.post1
cryptography                       3.4.6
cycler                             0.10.0
cymem                              2.0.5
Cython                             0.29.22
cytoolz                            0.11.0
dask                               2021.2.0
decorator                          4.4.2
defusedxml                         0.6.0
diff-match-patch                   20200713
dill                               0.3.3
distributed                        2021.2.0
docutils                           0.16
entrypoints                        0.3
environment-kernels                1.1.1
et-xmlfile                         1.0.1
fastai                             1.0.61
fastcache                          1.1.0
fastprogress                       1.0.0
filelock                           3.0.12
flake8                             3.8.4
Flask                              1.1.2
Flask-Cors                         3.0.10
fsspec                             0.8.3
future                             0.18.2
gevent                             21.1.1
glob2                              0.7
gmpy2                              2.0.8
google-pasta                       0.2.0
greenlet                           1.0.0
gym                                0.18.0
h5py                               2.10.0
hdijupyterutils                    0.18.0
HeapDict                           1.0.1
html5lib                           1.1
idna                               2.10
imagecodecs                        2021.1.11
imageio                            2.9.0
imagesize                          1.2.0
importlib-metadata                 2.0.0
iniconfig                          1.1.1
intervaltree                       3.1.0
ipykernel                          5.3.4
ipyparallel                        6.3.0
ipython                            7.20.0
ipython-genutils                   0.2.0
ipywidgets                         7.6.3
isort                              5.7.0
itsdangerous                       1.1.0
jdcal                              1.4.1
jedi                               0.17.2
jeepney                            0.6.0
Jinja2                             2.11.3
jmespath                           0.10.0
joblib                             1.0.1
json5                              0.9.5
jsonschema                         3.2.0
jupyter                            1.0.0
jupyter-client                     6.1.7
jupyter-console                    6.2.0
jupyter-core                       4.7.1
jupyter-packaging                  0.7.12
jupyter-server                     1.5.0
jupyterlab                         3.0.12
jupyterlab-pygments                0.1.2
jupyterlab-server                  2.3.0
jupyterlab-widgets                 1.0.0
keyring                            22.0.1
kiwisolver                         1.3.1
lazy-object-proxy                  1.5.2
libarchive-c                       2.9
llvmlite                           0.34.0
locket                             0.2.1
lxml                               4.6.3
MarkupSafe                         1.1.1
matplotlib                         3.3.4
mccabe                             0.6.1
mistune                            0.8.4
mkl-fft                            1.3.0
mkl-random                         1.1.1
mkl-service                        2.3.0
mock                               4.0.3
more-itertools                     8.7.0
mpi4py                             3.0.3
mpmath                             1.1.0
msgpack                            1.0.2
multipledispatch                   0.6.0
murmurhash                         1.0.5
mypy-extensions                    0.4.3
nb-conda                           2.2.1
nb-conda-kernels                   2.3.1
nbclassic                          0.2.6
nbclient                           0.5.2
nbconvert                          6.0.7
nbformat                           5.1.2
nest-asyncio                       1.5.1
networkx                           2.5
nltk                               3.5
nose                               1.3.7
notebook                           6.2.0
numba                              0.51.2
numexpr                            2.7.3
numpy                              1.19.2
numpydoc                           1.1.0
nvidia-ml-py3                      7.352.0
olefile                            0.46
onnx                               1.5.0
opencv-python                      3.4.13.47
openpyxl                           3.0.6
packaging                          20.9
pandas                             1.2.2
pandocfilters                      1.4.3
parso                              0.7.0
partd                              1.1.0
path                               15.1.2
pathlib2                           2.3.5
pathspec                           0.7.0
pathtools                          0.1.2
pathy                              0.4.0
patsy                              0.5.1
pep8                               1.7.1
PettingZoo                         1.8.0
pexpect                            4.8.0
pickleshare                        0.7.5
Pillow                             7.2.0
pip                                21.0.1
pkginfo                            1.7.0
plotly                             4.14.3
pluggy                             0.13.1
ply                                3.11
preshed                            3.0.5
prometheus-client                  0.9.0
prompt-toolkit                     3.0.8
protobuf                           3.15.6
protobuf3-to-dict                  0.1.5
psutil                             5.8.0
psycopg2                           2.7.5
ptyprocess                         0.7.0
py                                 1.10.0
pyarrow                            3.0.0
pycodestyle                        2.6.0
pycosat                            0.6.3
pycparser                          2.20
pycrypto                           2.6.1
pycurl                             7.43.0.6
pydantic                           1.7.3
pydocstyle                         5.1.1
pyerfa                             1.7.2
pyflakes                           2.2.0
pyfunctional                       1.4.3
pygal                              2.4.0
pyglet                             1.5.0
Pygments                           2.8.0
pykerberos                         1.2.1
pylint                             2.7.0
pyls-black                         0.4.6
pyls-spyder                        0.3.2
pynvml                             8.0.4
pyodbc                             4.0.0-unsupported
pyOpenSSL                          20.0.1
pyparsing                          2.4.7
pyrsistent                         0.17.3
PySocks                            1.7.1
pytest                             6.2.2
python-dateutil                    2.8.1
python-jsonrpc-server              0.4.0
python-language-server             0.36.2
pytz                               2021.1
PyWavelets                         1.1.1
pyxdg                              0.27
PyYAML                             5.4.1
pyzmq                              20.0.0
QDarkStyle                         2.8.1
QtAwesome                          1.0.1
qtconsole                          5.0.2
QtPy                               1.9.0
regex                              2020.11.13
requests                           2.25.1
requests-kerberos                  0.12.0
retrying                           1.3.3
rope                               0.18.0
Rtree                              0.9.4
ruamel-yaml                        0.15.87
s3fs                               0.2.0
s3transfer                         0.3.4
sagemaker                          2.31.1
scikit-image                       0.17.2
scikit-learn                       0.23.2
scipy                              1.6.1
seaborn                            0.11.1
SecretStorage                      3.3.1
Send2Trash                         1.5.0
setuptools                         49.6.0.post20210108
simplegeneric                      0.8.1
singledispatch                     0.0.0
six                                1.15.0
sklearn                            0.0
smart-open                         3.0.0
smclarify                          0.1
smdebug-rulesconfig                1.0.1
sniffio                            1.2.0
snowballstemmer                    2.1.0
sortedcollections                  2.1.0
sortedcontainers                   2.3.0
soupsieve                          2.2
spacy                              3.0.5
spacy-legacy                       3.0.1
sparkmagic                         0.15.0
Sphinx                             3.5.1
sphinxcontrib-applehelp            1.0.2
sphinxcontrib-devhelp              1.0.2
sphinxcontrib-htmlhelp             1.0.3
sphinxcontrib-jsmath               1.0.1
sphinxcontrib-qthelp               1.0.3
sphinxcontrib-serializinghtml      1.1.4
sphinxcontrib-websupport           1.2.4
spyder                             4.2.1
spyder-kernels                     1.10.2
SQLAlchemy                         1.3.23
srsly                              2.4.0
stable-baselines3                  1.1.0a1
statsmodels                        0.12.2
SuperSuit                          2.6.2
sympy                              1.7.1
tables                             3.6.1
tabulate                           0.8.9
tblib                              1.7.0
terminado                          0.9.2
testpath                           0.4.4
textdistance                       4.2.1
thinc                              8.0.2
threadpoolctl                      2.1.0
three-merge                        0.1.1
tifffile                           2021.1.14
toml                               0.10.1
toolz                              0.11.1
torch                              1.8.0+cu111
torch-model-archiver               0.2.1
torchserve                         0.2.1
torchvision                        0.9.0+cu111
tornado                            6.1
tqdm                               4.56.0
traitlets                          5.0.5
typed-ast                          1.4.2
typer                              0.3.2
typing                             3.7.4.3
typing-extensions                  3.7.4.3
ujson                              4.0.2
unicodecsv                         0.14.1
urllib3                            1.26.4
wasabi                             0.8.2
watchdog                           1.0.2
wcwidth                            0.2.5
webencodings                       0.5.1
Werkzeug                           1.0.1
wheel                              0.36.2
widgetsnbextension                 3.5.1
wrapt                              1.12.1
wurlitzer                          2.0.1
xlrd                               2.0.1
XlsxWriter                         1.3.7
xlwt                               1.3.0
yapf                               0.30.0
zict                               2.0.0
zipp                               3.4.0
zope.event                         4.5.0
zope.interface                     5.2.0
benblack769 commented 3 years ago

Sorry. I still can't reproduce this. I will likely have to launch an AWS instance to test this out.

p-veloso commented 3 years ago

@weepingwillowben, do you see any potential fix for multiprocessing in window or aws ubuntu in the near future?

benblack769 commented 3 years ago

Hi, following up on this.

I started an AWS instance with the Deep Learning AMI (Ubuntu 18.04) Version 42.1.

I ran

pip install supersuit
pip install stable_baselines3

Ran your mpe_test.py

And everything worked fine.

p-veloso commented 3 years ago

Sounds good. The only difference seems to be that I was using

pip install git+https://github.com/vwxyzjn/stable-baselines3

But now the monitor for the vectorized environments is integrated to the official sb3. Also, which of the available environments in the AMI did you use?

I will try again later. Thanks again.

benblack769 commented 3 years ago

I don't think I used any particular anaconda environment. I just pip installed everything immediately after logging in.

p-veloso commented 3 years ago

When I logged in, the interface suggests to use one of the many available environments. I used source activate to access the latest PyTorch environment then I installed the libraries I mentioned earlier. I will try to run the code later today.

p-veloso commented 3 years ago

@weepingwillowben, mpe_test.py works well outside of the anaconda environments. However, I still have problems with my custom environments, such as raumplan aws.zip. I cannot run it with your specifications because VecMonitor requires

pip install git+https://github.com/vwxyzjn/stable-baselines3

Then, after I install it, and I run it on aws (or on google colab), the sb3 class VecEnvWrapper triggers a max recursion error.

ubuntu@ip-172-31-0-98:~/[raumplan aws.zip](https://github.com/PettingZoo-Team/SuperSuit/files/6348377/raumplan.aws.zip)$ python raumplan_supersuit_train_for_pavilion.py
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 347, in getattr_depth_check
    all_attributes = self._get_all_attributes()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 321, in _get_all_attributes
    all_attributes.update(self.class_attributes)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 304, in __getattr__
    blocked_class = self.getattr_depth_check(name, already_found=False)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 347, in getattr_depth_check
    all_attributes = self._get_all_attributes()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 321, in _get_all_attributes
    all_attributes.update(self.class_attributes)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 304, in __getattr__
    blocked_class = self.getattr_depth_check(name, already_found=False)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 347, in getattr_depth_check
    all_attributes = self._get_all_attributes()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 321, in _get_all_attributes
    all_attributes.update(self.class_attributes)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 304, in __getattr__
    blocked_class = self.getattr_depth_check(name, already_found=False)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 347, in getattr_depth_check
    all_attributes = self._get_all_attributes()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 321, in _get_all_attributes
    all_attributes.update(self.class_attributes)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 304, in __getattr__
    blocked_class = self.getattr_depth_check(name, already_found=False)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 347, in getattr_depth_check
    all_attributes = self._get_all_attributes()
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 321, in _get_all_attributes
    all_attributes.update(self.class_attributes)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 304, in __getattr__
[repeats these 3 lines of errors multiple times]
RecursionError: maximum recursion depth exceeded
benblack769 commented 3 years ago

Sorry for the delay. This should be a separate issue from the first one.

If you remove this bit of code:

                  clip_range_vf = env.spec.reward_threshold,

it should work.

While this error message is a terrible one, and I'll look into replacing it with a sane AttributeError, getting an attribute of the base environment spec is not a feature we are looking to support in vector environments (especially not multiprocessing ones). Just get the reward threshold of the environment before it is wrapped in a vector environment.

benblack769 commented 3 years ago

Note that pettingzoo environments support getting attributes of the underlying environment through the .unwrapped attribute, which returns the unwrapped environment. So if env was a pettingzoo environment, rather than a vector environment, you could just do

env.unwrapped.spec.reward_threshold

p-veloso commented 3 years ago

@weepingwillowben I think it still crashes when I test in Colab (I will test it on AWS later). I have just uploaded an example with the code.

benblack769 commented 3 years ago

Thanks again for the example. We just found the maximum recursion depth internally ourselves. I'll make sure to release that fix today.

benblack769 commented 3 years ago

This fix was released. Let me know if you have more issues (especially if the initial issue crops up again).

p-veloso commented 3 years ago

Awesome. It works in google colab with and without multiprocessing. I will double check the previous problems on Wed.
If everything is fine, I will close this issue.

benblack769 commented 3 years ago

Just one note. In the colab notebook, you are still pip installing from the PR fork of stable_baselines3. That PR got merged, so you should now be able to pull from the SB3 master branch

apigott commented 3 years ago

Hi, I'm actually still having the same issue with stable-baselines3==1.1.0a3 and with the example that @weepingwillowben suggested in this comment.

Any suggestions on what I should try next?

(Running this on a Mac)

Traceback (most recent call last):
  File "/Users/aislingpigott/Documents/legendary-pancake/pettingzooTest.py", line 13, in <module>
    env = ss.concat_vec_envs_v0(env, 4, num_cpus=4, base_class='stable_baselines3')
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector_constructors.py", line 47, in concat_vec_envs
    vec_env = MakeCPUAsyncConstructor(num_cpus)(*vec_env_args(vec_env, num_vec_envs))
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/constructors.py", line 38, in constructor
    return ProcConcatVec(cat_env_fns, obs_space, act_space, num_fns * envs_per_env, example_env.metadata)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 89, in __init__
    proc.start()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'vec_env_args.<locals>.env_fn'
Exception ignored in: <function ProcConcatVec.__del__ at 0x179f8b160>
Traceback (most recent call last):
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 153, in __del__
    for pipe in self.pipes:
AttributeError: 'ProcConcatVec' object has no attribute 'pipes'
apigott commented 3 years ago

@p-veloso same issue

p-veloso commented 3 years ago

@apigott, as far as I remember that error disappeared when I tested in colab and aws (ubuntu). With that said, these are the fixes that I used over time

benblack769 commented 3 years ago

@apigott Are you using the latest version of supersuit? I thought that this issue was fixed.

apigott commented 3 years ago

I pulled supersuit 2.6.4 today with pip install supersuit

benblack769 commented 3 years ago

Are you using a windows system? MSYS, perhaps?

apigott commented 3 years ago

No, mac

benblack769 commented 3 years ago

Ah, now the issue is clear.

Spawn vs fork is a change in the default in python3.8 and newer on macs only.

https://discuss.python.org/t/multiprocessing-spawn-default-on-macos-since-python-3-8-is-slower-than-fork-method/5910/6

Looks like gym has a clever way of getting around this problem here: https://github.com/openai/gym/blob/a5a6ae6bc0a5cfc0ff1ce9be723d59593c165022/gym/vector/utils/misc.py#L6

I can make a PR to use this CloudPickleWrapper in supersuit's multiprocessing utilities.

benblack769 commented 3 years ago

In the meantime, you can call

import multiprocessing
multiprocessing.set_start_method("fork")

before creating the environment to fix this issue.

benblack769 commented 3 years ago

Actually upon investigation this is the only solution without major reworking of multiprocessing support. Please use the above multiprocessing set_start_method call to fix this issue.

benblack769 commented 3 years ago

86 is a documentation addition to inform future users of this problem and how to fix it.

apigott commented 3 years ago

@weepingwillowben Thanks. I was able to use those lines as well as export KMP_DUPLICATE_LIB_OK=TRUE to get a working test file. (I should mention that this git issue also suggests that conda install nomkl works but I didn't have luck with that either.) I'm not particularly well versed in multiprocessing, as I've only ever used the pathos lib, but here was the stack trace I got without exporting KMP_DUPLICATE_LIB_OK=TRUE.

(zoo) Aislings-MacBook-Pro:legendary-pancake aislingpigott$ python pettingzooTest.py 
Using cpu device
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Abort trap: 6
Process Process-4:
Traceback (most recent call last):
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 66, in async_loop
    pipe.send((e, tb))
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
(zoo) Aislings-MacBook-Pro:legendary-pancake aislingpigott$ Process Process-3:
Traceback (most recent call last):
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 66, in async_loop
    pipe.send((e, tb))
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process Process-2:
Traceback (most recent call last):
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 66, in async_loop
    pipe.send((e, tb))
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process Process-1:
Traceback (most recent call last):
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 31, in async_loop
    instr = pipe.recv()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/site-packages/supersuit/vector/multiproc_vec.py", line 66, in async_loop
    pipe.send((e, tb))
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
    self._send(header + buf)
  File "/opt/anaconda3/envs/zoo/lib/python3.9/multiprocessing/connection.py", line 373, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
benblack769 commented 3 years ago

Hmm. This is an unfortunate error. I wonder when the omp runtime is initialized (import time or when building the model), and whether it can reasonably be done after creating the environments.

Are you still running the same snippet of code from earlier in this github issue?

apigott commented 3 years ago

Yeah, running the same code snippet. The error comes after the imports. "Using cpu device" is output of model = PPO but it doesn't finish initializing the model from what I can tell with some debug print statements

ETA: I'm not sure what you mean by it being possible to move the mp call until after the env is initialized. The initial error stems from initialization of the env right? So it's not possible to move the mp.set_start_method() call until after the model initialization

from stable_baselines3.ppo import MlpPolicy
from stable_baselines3 import PPO
# from stable_baselines3.common.vec_env import VecMonitor
import supersuit as ss
# from petting_bubble_env_continuous import PettingBubblesEnvironment
from pettingzoo.mpe import simple_push_v2
import gym

import multiprocessing
print("Imports ok")
multiprocessing.set_start_method("fork")

env = simple_push_v2.parallel_env()
env = ss.pad_observations_v0(env)
env = ss.black_death_v1(env)
env = ss.pettingzoo_env_to_vec_env_v0(env)
env = ss.concat_vec_envs_v0(env, 4, num_cpus=4, base_class='stable_baselines3')

model = PPO(MlpPolicy, env, verbose=2, gamma=0.999, n_steps=1000, ent_coef=0.01, learning_rate=0.00025, vf_coef=0.5, max_grad_norm=0.5, gae_lambda=0.95, n_epochs=4, clip_range=0.2, clip_range_vf=1, tensorboard_log="./ppo_test/")
print("Script is stopped by broken pipe error")
model.learn(total_timesteps=100, tb_log_name="test",  reset_num_timesteps=True)
model.save("bubble_policy_test")
benblack769 commented 3 years ago

Yeah, it is probably initializing once when importing pytorch, then initializing again when using it. Very strange. From reading, it appears that the library is somehow not dealt with correctly during the fork.

I guess the correct long term solution is to support spawn multiprocessing natively like Gym does. I did not anticipate that this would be a problem.

I will create an issue to track this feature.