HerReplayBuffer cannot handle VecEnv with multiple environments

GalAvineri commented 2 years ago

🐛 Bug

HerReplayBuffer cannot handle VecEnv with more than 1 environment. It raises an error in the add() function when called with the values returned from the step() function of such a VecEnv. I belive it is because of the way the buffer is defined.

While fields such as the observation are defined with the shape of (num envs, item shape), the fields of action, reward and done are not.

To Reproduce

I've taken the example from the HER documantation page and changed the environment from a single environment to a DummyVecEnv containing two environments. i.e i've changed from this environment definition:

env = BitFlippingEnv(n_bits=N_BITS, continuous=model_class in [DDPG, SAC, TD3], max_steps=N_BITS)

to this definition:

env1 = BitFlippingEnv(n_bits=N_BITS, continuous=model_class in [DDPG, SAC, TD3], max_steps=N_BITS)
env2 = BitFlippingEnv(n_bits=N_BITS, continuous=model_class in [DDPG, SAC, TD3], max_steps=N_BITS)
vec_env = DummyVecEnv([lambda _env=env: _env for env in [env1, env2]])

I've also truncated everything beyond the learn() call as the error happens during this call. Everything else is the same.

System Info

The result of sb3.get_system_info(): OS: Windows-10-10.0.19042-SP0 10.0.19042 Python: 3.9.12 Stable-Baselines3: 1.4.0 PyTorch: 1.10.2 GPU Enabled: False Numpy: 1.21.5 Gym: 0.19.0

Checklist

[x] I have checked that there is no similar issue in the repo (required)
[x] I have read the documentation (required)
[x] I have provided a minimal working example to reproduce the bug (required)

qgallouedec commented 2 years ago

HER is not compatible with multi processing (see this RL Algorithms in the doc).

However, there are two working PR on the subject : #654 and #704 if you want to take a look.

qgallouedec commented 2 years ago

Duplicate #826

GalAvineri commented 2 years ago

I missed all the PRs and the documentation. Thank you for all the information!

DLR-RM / stable-baselines3