Closed Maldades closed 3 years ago
Hello, please fill in the issue template completely, notably result of env checker.
Sure!
Stable-baselines CheckEnv produces no output.
My Env without unncessary detail. If you need moar detail, please feel free to ask for it.
class PopulationManager(gym.Env):
metadata = {'render.modes': ['human']}
def __init__(self,func, bounds, arg1, arg2 = 20 ):
super(gym.Env, self).__init__()
self.arg1 = arg1
self.arg2 = arg2
self.bounds = bounds
self.action_space = gym.spaces.Box(low=0.1, high=100, shape=(4,))
self.observation_space = gym.spaces.Box(low = self.bounds[0,0], high=self.bounds[0,1], shape=self.define_observation().shape)
def reset(self):
observation = self.define_observation() #returns a flat np.array of size [45,]
return observation
def step(self, action):
agent_action=self.project_action(action) # projects [-1,1] to my needed range
self.act_on_env(agent_action) #changes self.vars, does not return anything
done = self.check_termination() # checks self.vars, returns True/False
if done:
reward = self.define_reward() # returns float
else:
reward = 0.
observation = self.define_observation() #returns a flat np.array of size [45,]
return observation, reward, done, {}
Libraries were installed with pip in a venv.
GPU: Nvidida RTX 2060
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 Off | 00000000:01:00.0 On | N/A |
| 0% 32C P8 14W / 190W | 355MiB / 5931MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
python --version: Python 3.7.3 Pytorch version: as seen in pip list: torch 1.5.0 Gym version: 0.18.0 pip list: pip list gives out the following:
Package Version Location
----------------------------- ------------ ------------------------------------------------------------------------------------
absl-py 0.11.0
alabaster 0.7.12
apipkg 1.5
appdirs 1.4.4
astroid 2.4.2
atari-py 0.2.6
attrs 20.3.0
Babel 2.9.0
black 20.8b1
cachetools 4.2.1
certifi 2020.12.5
chardet 4.0.0
click 7.1.2
cloudpickle 1.6.0
coverage 5.4
cycler 0.10.0
decorator 4.4.2
docutils 0.16
execnet 1.8.0
flake8 3.8.4
future 0.18.2
google-auth 1.26.1
google-auth-oauthlib 0.4.2
grpcio 1.35.0
gym 0.18.0
gym-tersq 0.0.1 /home/bondades/Documents/docker_containers/baselines-zoo/rl-baselines3-zoo/gym_tersq
idna 2.10
imagesize 1.2.0
importlab 0.6.1
importlib-metadata 3.4.0
iniconfig 1.1.1
isort 5.7.0
Jinja2 2.11.3
kiwisolver 1.3.1
lazy-object-proxy 1.4.3
livereload 2.6.3
llvmlite 0.32.1
Markdown 3.3.3
MarkupSafe 1.1.1
matplotlib 3.3.4
mccabe 0.6.1
mypy-extensions 0.4.3
networkx 2.5
ninja 1.10.0.post2
numba 0.49.0
numpy 1.20.1
oauthlib 3.1.0
opencv-python 4.5.1.48
packaging 20.9
pandas 1.2.2
pathspec 0.8.1
Pillow 7.2.0
pip 21.0.1
pkg-resources 0.0.0
pluggy 0.13.1
protobuf 3.14.0
psutil 5.8.0
py 1.10.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycodestyle 2.6.0
pyDOE 0.3.8
pyenchant 3.2.0
pyflakes 2.2.0
pyglet 1.5.0
Pygments 2.8.0
pylint 2.6.2
pyparsing 2.4.7
pytest 6.2.2
pytest-cov 2.11.1
pytest-env 0.6.2
pytest-forked 1.3.0
pytest-xdist 2.2.1
python-dateutil 2.8.1
pytype 2021.2.9
pytz 2021.1
PyYAML 5.4.1
regex 2020.11.13
requests 2.25.1
requests-oauthlib 1.3.0
rsa 4.7
sb3-contrib 0.10.0
scipy 1.6.0
seaborn 0.11.1
setuptools 53.0.0
six 1.15.0
snowballstemmer 2.1.0
Sphinx 3.5.1
sphinx-autobuild 2020.9.1
sphinx-autodoc-typehints 1.11.1
sphinx-rtd-theme 0.5.1
sphinxcontrib-applehelp 1.0.2
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 1.0.3
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.4
sphinxcontrib-spelling 7.1.0
stable-baselines3 0.10.0
tensorboard 2.4.1
tensorboard-plugin-wit 1.8.0
toml 0.10.2
torch 1.5.0
tornado 6.1
typed-ast 1.4.2
typing-extensions 3.7.4.3
urllib3 1.26.3
Werkzeug 1.0.1
wheel 0.36.2
wrapt 1.12.1
zipp 3.4.0
I execute the code through an auxiliary script. After introducing a debugging print statement in .../stable_baselines3/common/on_policy_algorithm.py, the traceback was the following:
python my_script.py
Using cpu device
DEBUG /home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/common/on_policy_algorithm.py
| self.ep_info_buffer: deque([1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038, 2039, 2040, 2041, 2042, 2043, 2044, 2045, 2046, 2047], maxlen=100)
Traceback (most recent call last):
File "my_script.py", line 30, in <module>
model.learn(total_timesteps=100000)
File "/home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/ppo/ppo.py", line 264, in learn
reset_num_timesteps=reset_num_timesteps,
File "/home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 238, in learn
if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:
TypeError: object of type 'int' has no len()
auxiliary script:
import gym
import gym_tersq
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecCheckNan
import torch as th
import numpy as np
import os
from stable_baselines3.common.env_checker import check_env
# Create save dir
save_dir = "/tmp/gym/"
os.makedirs(save_dir, exist_ok=True)
th.autograd.set_detect_anomaly(True)
np.seterr(all='raise')
env = gym.make("gym_tersq:Tersq-v0")
check_env(env, warn=True)
env = DummyVecEnv([lambda: env])
env = VecCheckNan(env, raise_exception=True)
#env = gym.make("Tennis-v0")
model = PPO('MlpPolicy', env, verbose=2)
model.learn(total_timesteps=100000)
model.save(save_dir + "/PPO")
obs = env.reset()
Please feel free to ask for additional information.
Thank you very much
Hello, I think the issue comes from your environment, please find below a minimal working example (cf issue template next time ;), the example you provided cannot be run) that does not produce any error.
Stable-baselines CheckEnv produces no output.
Let me doubt of that ;) (there should be a least one warning)
import gym
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env
class PopulationManager(gym.Env):
metadata = {'render.modes': ['human']}
def __init__(self):
super(PopulationManager, self).__init__()
self.action_space = gym.spaces.Box(low=0.1, high=100, shape=(4,))
self.observation_space = gym.spaces.Box(low = -np.inf, high=np.inf, shape=(45,))
def reset(self):
return self.observation_space.sample()
def step(self, action):
reward = 0.0
done = bool(np.random.rand() > 0.8)
return self.observation_space.sample(), reward, done, {}
env = PopulationManager()
check_env(env)
model = PPO("MlpPolicy", env, verbose=1).learn(10000)
Hmmm....in fact yes, I pasted a version of my environment that had an old action_space. It should have given a warning. Nice catch ;-)
Weird, when I execute your minimal working example, ep_info_buffer remains empty:
DEBUG /home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/common/on_policy_algorithm.py | self.ep_info_buffer: deque([], maxlen=100)
Since ep_info_buffer only interacts with infos and dones, and they are the same in both codes ....wtflüüt
I'll look into it and post my findings for future trouble-makers. Thank you very much indeed.
Weird, when I execute your minimal working example, ep_info_buffer remains empty:
It will remain empty until the end of an episode (when the monitor wrapper passes the episode length and episode reward)
Ok, it was my environment.
I have not been able to pinpoint the cause, but acting directly on one .py file instead of importing my pip -e installed environment did the trick. Probably it is because changes in my script were not automatically ported to my installed package, or something like that. It's hard not to mess something up when learning how to use linux, docker, pip structures and RL for the same project.
Thank you loads!
Hello, I have this weird issue, for as far as I understand it it should affect nearly every stable-baselines3 user. Which it surely does not, so here I am wondering what I'm not understanding.
Whenever I try to use an on policy algorithm I get the following error:
In my custom environment I return the usual empty dictionary {} as
info
, which I believe is central to this error.After studying the code, I think the error comes from baseclass._update_info_buffer(), which on-policy-algorithms call passing only
info
(optionally you could also passdone
).The method
_update_info_buffer
creates a deque of integers each representing the episode. This deque is then called in on_policy_algorithm.py, line 238, under:if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:
, which throws and error, because ep_info_buffer[0] is an integer and has no length.I have tried tinkering with the info dictionary from my env with no success, as
_update_info_buffer
only cares for info.get("episode"), which is generated automatically.Would you please help me out?