DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.38k stars 1.61k forks source link

[Question] influence of buffer size when using vecenv and save customized replay buffer #1885

Closed JaimeParker closed 3 months ago

JaimeParker commented 3 months ago

❓ Question

My first question is should I resize the default buffer_size when using off-policy algorithm and vectorized env?

I noticed that the default buffer_size of SAC is 1e6, for env_num=1. However, for vecenv, the buffer_size for each env is buffer_size / num_env, which means much less replay buffer for each env when I'm using make_vec_env.

For SAC, replay buffer contains $(s_t,at,s{t+1},r_t)$ of different policies, with the help of max entropy it can avoid staying in a local minimum, which means searching advantages.

So will less buffer size cause bad behavior? And is there a limit of max buffer_size for each env?

My second question is near to issue 278, which tells me how to add a customized replay_buffer to buffer. But now I want to create a customized replay_buffer then saving it. Here is my code:

from stable_baselines3.common import off_policy_algorithm, buffers
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.envs import CustomizedEnv
from stable_baselines3.sac.policies import MlpPolicy

env = CustomizedEnv()
vec_env = make_vec_env(CustomizedEnv, n_envs=20)

m_off_policy_algorithm = off_policy_algorithm.OffPolicyAlgorithm(
    policy=MlpPolicy,
    env=env,
    learning_rate=0.0003
)

# to calculate expert traj
# define customized replay buffer here, should be a num_env * (buffer_size / num_env) matrix
m_replay_buffer = buffers.ReplayBuffer(1000, etc...)

m_off_policy_algorithm.replay_buffer = m_replay_buffer
m_off_policy_algorithm.save_replay_buffer('replay_buffer.pkl')

is this way recommended or is there any other way to do this (create a customized replay buffer then saving it) ?

Checklist

araffin commented 3 months ago

should I resize the default buffer_size when using off-policy algorithm and vectorized env? the buffer_size for each env is buffer_size / num_env,

yes https://github.com/DLR-RM/stable-baselines3/blob/5623d98f9d6bcfd2ab450e850c3f7b090aef5642/stable_baselines3/common/buffers.py#L196-L197 because the shape of the buffer is: https://github.com/DLR-RM/stable-baselines3/blob/5623d98f9d6bcfd2ab450e850c3f7b090aef5642/stable_baselines3/common/buffers.py#L212 so at the end, the number of transitions stored is the same, you should not need to do anything.

About the size of the replay buffer in general: There is no recommended answer for that. The buffer size should be usually "large" enough, which means that for many task, it will never be full (for instance, it has a size of 1M transitions on MuJoCo tasks, but the maximum number of timesteps is 1M too). Depending on your task, you might want to resize it to be more on-policy (keep mostly recent data).

But now I want to create a customized replay_buffer then saving it. Here is my code:

Not sure how custom you want it to be. As long as you derived from the base class and follow the interface there should not be any issue. If you just want to pre-fill a replay buffer, then you can simply instantiate a SB3 one (or use the empty one after initializing the model) and manually add transitions.

JaimeParker commented 3 months ago

Many thanks.

The buffer size should be usually "large" enough, which means that for many task, it will never be full.

My task is about a robot from a random state to another random state, so that it might need as much recent data as possible, and I tested the buffer size here:

        print(replay_buffer.size())
        replay_buffer.add(
            self._last_original_obs,  # type: ignore[arg-type]
            next_obs,  # type: ignore[arg-type]
            buffer_action,
            reward_,
            dones,
            infos,
        )

the outcome shows it will be full quickly. So for such a randomized task, should I increase the buffer size significantly?

or maybe RL is not so capable for such task?