Closed JaimeParker closed 3 months ago
should I resize the default buffer_size when using off-policy algorithm and vectorized env? the buffer_size for each env is buffer_size / num_env,
yes https://github.com/DLR-RM/stable-baselines3/blob/5623d98f9d6bcfd2ab450e850c3f7b090aef5642/stable_baselines3/common/buffers.py#L196-L197 because the shape of the buffer is: https://github.com/DLR-RM/stable-baselines3/blob/5623d98f9d6bcfd2ab450e850c3f7b090aef5642/stable_baselines3/common/buffers.py#L212 so at the end, the number of transitions stored is the same, you should not need to do anything.
About the size of the replay buffer in general: There is no recommended answer for that. The buffer size should be usually "large" enough, which means that for many task, it will never be full (for instance, it has a size of 1M transitions on MuJoCo tasks, but the maximum number of timesteps is 1M too). Depending on your task, you might want to resize it to be more on-policy (keep mostly recent data).
But now I want to create a customized replay_buffer then saving it. Here is my code:
Not sure how custom you want it to be. As long as you derived from the base class and follow the interface there should not be any issue. If you just want to pre-fill a replay buffer, then you can simply instantiate a SB3 one (or use the empty one after initializing the model) and manually add transitions.
Many thanks.
The buffer size should be usually "large" enough, which means that for many task, it will never be full.
My task is about a robot from a random state to another random state, so that it might need as much recent data as possible, and I tested the buffer size here:
print(replay_buffer.size())
replay_buffer.add(
self._last_original_obs, # type: ignore[arg-type]
next_obs, # type: ignore[arg-type]
buffer_action,
reward_,
dones,
infos,
)
the outcome shows it will be full quickly. So for such a randomized task, should I increase the buffer size significantly?
or maybe RL is not so capable for such task?
❓ Question
My first question is should I resize the default
buffer_size
when using off-policy algorithm and vectorized env?I noticed that the default
buffer_size
of SAC is 1e6, for env_num=1. However, for vecenv, the buffer_size for each env isbuffer_size / num_env
, which means much less replay buffer for each env when I'm usingmake_vec_env
.For SAC, replay buffer contains $(s_t,at,s{t+1},r_t)$ of different policies, with the help of max entropy it can avoid staying in a local minimum, which means searching advantages.
So will less buffer size cause bad behavior? And is there a limit of max buffer_size for each env?
My second question is near to issue 278, which tells me how to add a customized
replay_buffer
to buffer. But now I want to create a customized replay_buffer then saving it. Here is my code:is this way recommended or is there any other way to do this (create a customized replay buffer then saving it) ?
Checklist