DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.97k stars 1.68k forks source link

[Bug] ValueError from the demonstration code #1070

Closed fengwang closed 2 years ago

fengwang commented 2 years ago

🐛 Bug

Encountered ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1. with the demonstration code copied from the repo readme.

To Reproduce

Steps to reproduce the behavior.

  1. Install most recent stable-baseline3 from ArchLinux repository, with pkgbuild:
# Maintainer: Benoît Allard <benoit.allard@gmx.de>
pkgname=python-stable-baselines3
pkgver=1.6.0
pkgrel=1
pkgdesc="A set of reliable implementations of reinforcement learning algorithms in PyTorch"
arch=('any')
url="https://github.com/DLR-RM/stable-baselines3"
license=('MIT')
depends=("python" "python-gym" "python-numpy" "python-pytorch" "python-cloudpickle" "python-pandas" "python-matplotlib")
optdepends=('python-opencv: For render'
    'python-ale-py: For atari games'
    'python-pillow: For atari games'
    'tensorboard: Tensorboard support'
    'python-psutil: Checking memory taken by replay buffer')
_name=${pkgname#python-}
source=("$pkgname-$pkgver.tar.gz::https://github.com/DLR-RM/$_name/archive/v$pkgver.tar.gz")

build() {
    cd $_name-$pkgver
    export PYTHONSEED=1
    python setup.py build
}

package() {
    cd $_name-$pkgver
    python setup.py install --root="$pkgdir" --optimize=1 --skip-build
}
sha256sums=('f6642fb002adf7ce10087319ea8e9a331d95d26f6558067339f26c84fc588bb6')

Note: this error repeats with 1.5.0 as well.

  1. Install gym 0.26.0 with pkgbuild from AUR

  2. Run a minimal example taken from the readme of this repo

import gym

from stable_baselines3 import PPO

env = gym.make("CartPole-v1")

model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000)

obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    env.render()
    if done:
      obs = env.reset()

env.close()

Error messages generated:

Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
  File "/home/feng/tmp/program/python/reinforcement_learning/_1.py", line 8, in <module>
    model.learn(total_timesteps=10_000)
  File "/usr/lib/python3.10/site-packages/stable_baselines3/ppo/ppo.py", line 310, in learn
    return super().learn(
  File "/usr/lib/python3.10/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 239, in learn
    total_timesteps, callback = self._setup_learn(
  File "/usr/lib/python3.10/site-packages/stable_baselines3/common/base_class.py", line 446, in _setup_learn
    self._last_obs = self.env.reset()  # pytype: disable=annotation-type-mismatch
  File "/usr/lib/python3.10/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 64, in reset
    self._save_obs(env_idx, obs)
  File "/usr/lib/python3.10/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 94, in _save_obs
    self.buf_obs[key][env_idx] = obs
ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1.

Expected behavior

The demonstration code from the readme is supposed to work all the time.

 System Info

OS: Linux-5.15.68-1-lts-x86_64-with-glibc2.36 #1 SMP Thu, 15 Sep 2022 09:53:50 +0000
Python: 3.10.7
Stable-Baselines3: 1.6.0
PyTorch: 1.12.1
GPU Enabled: True
Numpy: 1.23.3
Gym: 0.26.0

({'OS': 'Linux-5.15.68-1-lts-x86_64-with-glibc2.36 #1 SMP Thu, 15 Sep 2022 09:53:50 +0000', 'Python': '3.10.7', 'Stable-Baselines3': '1.6.0', 'PyTorch': '1.12.1', 'GPU Enabled': 'True', 'Numpy': '1.23.3', 'Gym': '0.26.0'}, 'OS: Linux-5.15.68-1-lts-x86_64-with-glibc2.36 #1 SMP Thu, 15 Sep 2022 09:53:50 +0000\nPython: 3.10.7\nStable-Baselines3: 1.6.0\nPyTorch: 1.12.1\nGPU Enabled: True\nNumpy: 1.23.3\nGym: 0.26.0\n')

Additional context

Add any other context about the problem here.

Checklist

araffin commented 2 years ago

Duplicate of https://github.com/DLR-RM/stable-baselines3/issues/871 (also documented, current SB3 version only supports gym 0.21, support for gym 0.22+ is done in a separate branch)