[Bug] ValueError from the demonstration code

🐛 Bug

Encountered ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1. with the demonstration code copied from the repo readme.

To Reproduce

Steps to reproduce the behavior.

Install most recent stable-baseline3 from ArchLinux repository, with pkgbuild:

# Maintainer: Benoît Allard <benoit.allard@gmx.de>
pkgname=python-stable-baselines3
pkgver=1.6.0
pkgrel=1
pkgdesc="A set of reliable implementations of reinforcement learning algorithms in PyTorch"
arch=('any')
url="https://github.com/DLR-RM/stable-baselines3"
license=('MIT')
depends=("python" "python-gym" "python-numpy" "python-pytorch" "python-cloudpickle" "python-pandas" "python-matplotlib")
optdepends=('python-opencv: For render'
    'python-ale-py: For atari games'
    'python-pillow: For atari games'
    'tensorboard: Tensorboard support'
    'python-psutil: Checking memory taken by replay buffer')
_name=${pkgname#python-}
source=("$pkgname-$pkgver.tar.gz::https://github.com/DLR-RM/$_name/archive/v$pkgver.tar.gz")

build() {
    cd $_name-$pkgver
    export PYTHONSEED=1
    python setup.py build
}

package() {
    cd $_name-$pkgver
    python setup.py install --root="$pkgdir" --optimize=1 --skip-build
}
sha256sums=('f6642fb002adf7ce10087319ea8e9a331d95d26f6558067339f26c84fc588bb6')

Note: this error repeats with 1.5.0 as well.

Install gym 0.26.0 with pkgbuild from AUR
Run a minimal example taken from the readme of this repo

import gym

from stable_baselines3 import PPO

env = gym.make("CartPole-v1")

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000)

obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    env.render()
    if done:
      obs = env.reset()

env.close()

Error messages generated:

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
  File "/home/feng/tmp/program/python/reinforcement_learning/_1.py", line 8, in <module>
    model.learn(total_timesteps=10_000)
  File "/usr/lib/python3.10/site-packages/stable_baselines3/ppo/ppo.py", line 310, in learn
    return super().learn(
  File "/usr/lib/python3.10/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 239, in learn
    total_timesteps, callback = self._setup_learn(
  File "/usr/lib/python3.10/site-packages/stable_baselines3/common/base_class.py", line 446, in _setup_learn
    self._last_obs = self.env.reset()  # pytype: disable=annotation-type-mismatch
  File "/usr/lib/python3.10/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 64, in reset
    self._save_obs(env_idx, obs)
  File "/usr/lib/python3.10/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 94, in _save_obs
    self.buf_obs[key][env_idx] = obs
ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1.

Expected behavior

The demonstration code from the readme is supposed to work all the time.

System Info

OS: Linux-5.15.68-1-lts-x86_64-with-glibc2.36 #1 SMP Thu, 15 Sep 2022 09:53:50 +0000
Python: 3.10.7
Stable-Baselines3: 1.6.0
PyTorch: 1.12.1
GPU Enabled: True
Numpy: 1.23.3
Gym: 0.26.0

({'OS': 'Linux-5.15.68-1-lts-x86_64-with-glibc2.36 #1 SMP Thu, 15 Sep 2022 09:53:50 +0000', 'Python': '3.10.7', 'Stable-Baselines3': '1.6.0', 'PyTorch': '1.12.1', 'GPU Enabled': 'True', 'Numpy': '1.23.3', 'Gym': '0.26.0'}, 'OS: Linux-5.15.68-1-lts-x86_64-with-glibc2.36 #1 SMP Thu, 15 Sep 2022 09:53:50 +0000\nPython: 3.10.7\nStable-Baselines3: 1.6.0\nPyTorch: 1.12.1\nGPU Enabled: True\nNumpy: 1.23.3\nGym: 0.26.0\n')

Additional context

Add any other context about the problem here.

Checklist

[x] I have checked that there is no similar issue in the repo (required)
[x] I have read the documentation (required)
[x] I have provided a minimal working example to reproduce the bug (required)

DLR-RM / stable-baselines3

[Bug] ValueError from the demonstration code #1070