DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.25k stars 1.71k forks source link

HER replay buffer index out of bounds #261

Closed ErikPasztor closed 3 years ago

ErikPasztor commented 3 years ago

Question

I'm trying to teach a custom environment using TD3+HER (relevant code below). model.learn() produces an error

    model.learn(learn_total_steps,tb_log_name="tb/{}/".format(config["session_ID"]))
  File "D:\MyStuff\Python\lib\site-packages\stable_baselines3\her\her.py", line 197, in learn
    rollout = self.collect_rollouts(
  File "D:\MyStuff\Python\lib\site-packages\stable_baselines3\her\her.py", line 318, in collect_rollouts
    self.replay_buffer.add(self._last_original_obs, next_obs, buffer_action, reward_, done, infos)
  File "D:\MyStuff\Python\lib\site-packages\stable_baselines3\her\her_replay_buffer.py", line 317, in add
    self.buffer["observation"][self.pos][self.current_idx] = obs["observation"]
IndexError: index 1000 is out of bounds for axis 0 with size 1000

I tried to follow the example code for HER, but I feel like I'm missing something. I tried using buffer_size=env_max_steps+1, but that didn't help. Can I fix this? Or is this a bug?

Code

from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines3.common.monitor import Monitor
from stable_baselines3 import HER, TD3
from humanoid2her import HumanoidBulletEnv

env_max_steps = 1000
learn_epochs = 1000
learn_total_steps = env_max_steps * learn_epochs

env = DummyVecEnv([lambda: Monitor(HumanoidBulletEnv(animate=False, max_steps=env_max_steps))])
env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=20.,clip_reward=20.)

# Available strategies (cf paper): future, final, episode, random
goal_selection_strategy = 'future' # equivalent to GoalSelectionStrategy.FUTURE
model_class = TD3
model = HER('MlpPolicy', env, model_class, n_sampled_goal=4, goal_selection_strategy=goal_selection_strategy,
                    max_episode_length=env_max_steps,online_sampling=True)

model.learn(learn_total_steps,tb_log_name="tb/")

Checklist

araffin commented 3 years ago

I tried to follow the example code for HER, but I feel like I'm missing something. I tried using buffer_size=env_max_steps+1, but that didn't help. Can I fix this? Or is this a bug?

Hello, it seems that you are using a custom environment, I would recommend you to fill up the custom env template ;) I would also recommend you to use the TimeLimit wrapper from gym to avoid any issue regarding the timeout. Finally, make sure to the use the master version of SB3 (cf doc), we recently fixed a bug in https://github.com/DLR-RM/stable-baselines3/issues/234

ErikPasztor commented 3 years ago

Hi, using TimeLimit wrapper did help. I ended up specifying the maximum number of steps for each wrapper, but I got it to work. Thank you for your help.