DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9k stars 1.69k forks source link

If len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0: TypeError: object of type 'int' has no len() #320

Closed Maldades closed 3 years ago

Maldades commented 3 years ago

Hello, I have this weird issue, for as far as I understand it it should affect nearly every stable-baselines3 user. Which it surely does not, so here I am wondering what I'm not understanding.

Whenever I try to use an on policy algorithm I get the following error:

Traceback (most recent call last):
  File "my_script.py", line 30, in <module>
    model.learn(total_timesteps=100000)
  File "/home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/a2c/a2c.py", line 192, in learn
    reset_num_timesteps=reset_num_timesteps,
  File "/home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 238, in learn
    if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:
TypeError: object of type 'int' has no len()

In my custom environment I return the usual empty dictionary {} as info, which I believe is central to this error.

After studying the code, I think the error comes from baseclass._update_info_buffer(), which on-policy-algorithms call passing only info (optionally you could also pass done).

The method _update_info_buffer creates a deque of integers each representing the episode. This deque is then called in on_policy_algorithm.py, line 238, under: if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:, which throws and error, because ep_info_buffer[0] is an integer and has no length.

I have tried tinkering with the info dictionary from my env with no success, as _update_info_buffer only cares for info.get("episode"), which is generated automatically.

Would you please help me out?

araffin commented 3 years ago

Hello, please fill in the issue template completely, notably result of env checker.

Maldades commented 3 years ago

Sure!

Stable-baselines CheckEnv produces no output.

My Env without unncessary detail. If you need moar detail, please feel free to ask for it.

class PopulationManager(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self,func, bounds, arg1, arg2 = 20 ):
        super(gym.Env, self).__init__()
        self.arg1 = arg1
        self.arg2 = arg2
        self.bounds = bounds
        self.action_space = gym.spaces.Box(low=0.1, high=100, shape=(4,))
        self.observation_space = gym.spaces.Box(low = self.bounds[0,0], high=self.bounds[0,1], shape=self.define_observation().shape)

    def reset(self):
        observation = self.define_observation()  #returns a flat np.array of size [45,]        
        return observation    

   def step(self, action):

        agent_action=self.project_action(action) # projects [-1,1] to my needed range
        self.act_on_env(agent_action)  #changes self.vars, does not return anything
        done = self.check_termination() # checks self.vars, returns True/False
        if done:
            reward = self.define_reward() # returns float
        else:
            reward = 0.
        observation = self.define_observation() #returns a flat np.array of size [45,]     

        return observation, reward, done, {}

System Info Describe the characteristic of your environment:

Libraries were installed with pip in a venv.

GPU: Nvidida RTX 2060

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   32C    P8    14W / 190W |    355MiB /  5931MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

python --version: Python 3.7.3 Pytorch version: as seen in pip list: torch 1.5.0 Gym version: 0.18.0 pip list: pip list gives out the following:

Package                       Version      Location
----------------------------- ------------ ------------------------------------------------------------------------------------
absl-py                       0.11.0
alabaster                     0.7.12
apipkg                        1.5
appdirs                       1.4.4
astroid                       2.4.2
atari-py                      0.2.6
attrs                         20.3.0
Babel                         2.9.0
black                         20.8b1
cachetools                    4.2.1
certifi                       2020.12.5
chardet                       4.0.0
click                         7.1.2
cloudpickle                   1.6.0
coverage                      5.4
cycler                        0.10.0
decorator                     4.4.2
docutils                      0.16
execnet                       1.8.0
flake8                        3.8.4
future                        0.18.2
google-auth                   1.26.1
google-auth-oauthlib          0.4.2
grpcio                        1.35.0
gym                           0.18.0
gym-tersq                     0.0.1        /home/bondades/Documents/docker_containers/baselines-zoo/rl-baselines3-zoo/gym_tersq
idna                          2.10
imagesize                     1.2.0
importlab                     0.6.1
importlib-metadata            3.4.0
iniconfig                     1.1.1
isort                         5.7.0
Jinja2                        2.11.3
kiwisolver                    1.3.1
lazy-object-proxy             1.4.3
livereload                    2.6.3
llvmlite                      0.32.1
Markdown                      3.3.3
MarkupSafe                    1.1.1
matplotlib                    3.3.4
mccabe                        0.6.1
mypy-extensions               0.4.3
networkx                      2.5
ninja                         1.10.0.post2
numba                         0.49.0
numpy                         1.20.1
oauthlib                      3.1.0
opencv-python                 4.5.1.48
packaging                     20.9
pandas                        1.2.2
pathspec                      0.8.1
Pillow                        7.2.0
pip                           21.0.1
pkg-resources                 0.0.0
pluggy                        0.13.1
protobuf                      3.14.0
psutil                        5.8.0
py                            1.10.0
pyasn1                        0.4.8
pyasn1-modules                0.2.8
pycodestyle                   2.6.0
pyDOE                         0.3.8
pyenchant                     3.2.0
pyflakes                      2.2.0
pyglet                        1.5.0
Pygments                      2.8.0
pylint                        2.6.2
pyparsing                     2.4.7
pytest                        6.2.2
pytest-cov                    2.11.1
pytest-env                    0.6.2
pytest-forked                 1.3.0
pytest-xdist                  2.2.1
python-dateutil               2.8.1
pytype                        2021.2.9
pytz                          2021.1
PyYAML                        5.4.1
regex                         2020.11.13
requests                      2.25.1
requests-oauthlib             1.3.0
rsa                           4.7
sb3-contrib                   0.10.0
scipy                         1.6.0
seaborn                       0.11.1
setuptools                    53.0.0
six                           1.15.0
snowballstemmer               2.1.0
Sphinx                        3.5.1
sphinx-autobuild              2020.9.1
sphinx-autodoc-typehints      1.11.1
sphinx-rtd-theme              0.5.1
sphinxcontrib-applehelp       1.0.2
sphinxcontrib-devhelp         1.0.2
sphinxcontrib-htmlhelp        1.0.3
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.3
sphinxcontrib-serializinghtml 1.1.4
sphinxcontrib-spelling        7.1.0
stable-baselines3             0.10.0
tensorboard                   2.4.1
tensorboard-plugin-wit        1.8.0
toml                          0.10.2
torch                         1.5.0
tornado                       6.1
typed-ast                     1.4.2
typing-extensions             3.7.4.3
urllib3                       1.26.3
Werkzeug                      1.0.1
wheel                         0.36.2
wrapt                         1.12.1
zipp                          3.4.0

I execute the code through an auxiliary script. After introducing a debugging print statement in .../stable_baselines3/common/on_policy_algorithm.py, the traceback was the following:

python my_script.py
Using cpu device

DEBUG /home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/common/on_policy_algorithm.py 
 | self.ep_info_buffer:  deque([1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038, 2039, 2040, 2041, 2042, 2043, 2044, 2045, 2046, 2047], maxlen=100)
Traceback (most recent call last):
  File "my_script.py", line 30, in <module>
    model.learn(total_timesteps=100000)
  File "/home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/ppo/ppo.py", line 264, in learn
    reset_num_timesteps=reset_num_timesteps,
  File "/home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 238, in learn
    if len(self.ep_info_buffer) > 0 and len(self.ep_info_buffer[0]) > 0:
TypeError: object of type 'int' has no len()

auxiliary script:


import gym
import gym_tersq
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv, VecCheckNan
import torch as th
import numpy as np
import os
from stable_baselines3.common.env_checker import check_env

# Create save dir
save_dir = "/tmp/gym/"
os.makedirs(save_dir, exist_ok=True)

th.autograd.set_detect_anomaly(True)
np.seterr(all='raise')

env = gym.make("gym_tersq:Tersq-v0")
check_env(env, warn=True)
env = DummyVecEnv([lambda: env])
env = VecCheckNan(env, raise_exception=True)

#env = gym.make("Tennis-v0")
model = PPO('MlpPolicy', env, verbose=2)
model.learn(total_timesteps=100000)

model.save(save_dir + "/PPO")

obs = env.reset()

Please feel free to ask for additional information.

Thank you very much

araffin commented 3 years ago

Hello, I think the issue comes from your environment, please find below a minimal working example (cf issue template next time ;), the example you provided cannot be run) that does not produce any error.

Stable-baselines CheckEnv produces no output.

Let me doubt of that ;) (there should be a least one warning)


import gym
import numpy as np

from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env

class PopulationManager(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        super(PopulationManager, self).__init__()
        self.action_space = gym.spaces.Box(low=0.1, high=100, shape=(4,))
        self.observation_space = gym.spaces.Box(low = -np.inf, high=np.inf, shape=(45,))

    def reset(self):
        return self.observation_space.sample()

    def step(self, action):

        reward = 0.0
        done = bool(np.random.rand() > 0.8)

        return self.observation_space.sample(), reward, done, {}

env = PopulationManager()

check_env(env)

model = PPO("MlpPolicy", env, verbose=1).learn(10000)
Maldades commented 3 years ago

Hmmm....in fact yes, I pasted a version of my environment that had an old action_space. It should have given a warning. Nice catch ;-)

Weird, when I execute your minimal working example, ep_info_buffer remains empty: DEBUG /home/bondades/venvs/gym/lib/python3.7/site-packages/stable_baselines3/common/on_policy_algorithm.py | self.ep_info_buffer: deque([], maxlen=100) Since ep_info_buffer only interacts with infos and dones, and they are the same in both codes ....wtflüüt

I'll look into it and post my findings for future trouble-makers. Thank you very much indeed.

araffin commented 3 years ago

Weird, when I execute your minimal working example, ep_info_buffer remains empty:

It will remain empty until the end of an episode (when the monitor wrapper passes the episode length and episode reward)

Maldades commented 3 years ago

Ok, it was my environment.

I have not been able to pinpoint the cause, but acting directly on one .py file instead of importing my pip -e installed environment did the trick. Probably it is because changes in my script were not automatically ported to my installed package, or something like that. It's hard not to mess something up when learning how to use linux, docker, pip structures and RL for the same project.

Thank you loads!