I would like to clarify something about SB3's source code that differs from my own understanding.
(Hence, despite the formatting, this is not a bug report.)
Observation
In the source code, RolloutBuffer (and ReplayBuffer for that matter) appear to store transitions in n_envs-sized chunks.
Apologies, I have made a mistake in interpreting the RolloutBuffer code.
The samples are flattened in the get() function, which allows the transitions to be sampled individually.
❓ Question
I would like to clarify something about SB3's source code that differs from my own understanding. (Hence, despite the formatting, this is not a bug report.)
Observation
In the source code,
RolloutBuffer
(andReplayBuffer
for that matter) appear to store transitions inn_envs
-sized chunks.The samples are then retrieved with these chunks untouched.
This results in an effective minibatch size of
n_envs * batch_size
transitions.Expectation
Unlike the documentation for
n_steps
argument,batch_size
did not state this behavior.Therefore, minibatch size was expected to remain as
batch_size
.Question
Checklist