Encountered ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1. with the demonstration code copied from the repo readme.
To Reproduce
Steps to reproduce the behavior.
Install most recent stable-baseline3 from ArchLinux repository, with pkgbuild:
# Maintainer: Benoît Allard <benoit.allard@gmx.de>
pkgname=python-stable-baselines3
pkgver=1.6.0
pkgrel=1
pkgdesc="A set of reliable implementations of reinforcement learning algorithms in PyTorch"
arch=('any')
url="https://github.com/DLR-RM/stable-baselines3"
license=('MIT')
depends=("python" "python-gym" "python-numpy" "python-pytorch" "python-cloudpickle" "python-pandas" "python-matplotlib")
optdepends=('python-opencv: For render'
'python-ale-py: For atari games'
'python-pillow: For atari games'
'tensorboard: Tensorboard support'
'python-psutil: Checking memory taken by replay buffer')
_name=${pkgname#python-}
source=("$pkgname-$pkgver.tar.gz::https://github.com/DLR-RM/$_name/archive/v$pkgver.tar.gz")
build() {
cd $_name-$pkgver
export PYTHONSEED=1
python setup.py build
}
package() {
cd $_name-$pkgver
python setup.py install --root="$pkgdir" --optimize=1 --skip-build
}
sha256sums=('f6642fb002adf7ce10087319ea8e9a331d95d26f6558067339f26c84fc588bb6')
Run a minimal example taken from the readme of this repo
import gym
from stable_baselines3 import PPO
env = gym.make("CartPole-v1")
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10_000)
obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, reward, done, info = env.step(action)
env.render()
if done:
obs = env.reset()
env.close()
Error messages generated:
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Traceback (most recent call last):
File "/home/feng/tmp/program/python/reinforcement_learning/_1.py", line 8, in <module>
model.learn(total_timesteps=10_000)
File "/usr/lib/python3.10/site-packages/stable_baselines3/ppo/ppo.py", line 310, in learn
return super().learn(
File "/usr/lib/python3.10/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 239, in learn
total_timesteps, callback = self._setup_learn(
File "/usr/lib/python3.10/site-packages/stable_baselines3/common/base_class.py", line 446, in _setup_learn
self._last_obs = self.env.reset() # pytype: disable=annotation-type-mismatch
File "/usr/lib/python3.10/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 64, in reset
self._save_obs(env_idx, obs)
File "/usr/lib/python3.10/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 94, in _save_obs
self.buf_obs[key][env_idx] = obs
ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1.
Expected behavior
The demonstration code from the readme is supposed to work all the time.
🐛 Bug
Encountered
ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1.
with the demonstration code copied from the repo readme.To Reproduce
Steps to reproduce the behavior.
Note: this error repeats with 1.5.0 as well.
Install
gym 0.26.0
with pkgbuild from AURRun a minimal example taken from the readme of this repo
Error messages generated:
Expected behavior
The demonstration code from the readme is supposed to work all the time.
System Info
Additional context
Add any other context about the problem here.
Checklist