This PR adds a new sampler that will sample all the episodes in the Trajectory, even the truncated one (the first episode being typically the only one that's truncated in a limited size buffer).
The decision to sample all episodes is due to two reasons:
Algorithms that use whole episode sampling are typically on-policy algorithms (eg. PPO, TRPO) which means that they use the entirety of the buffered transitions then discard them. As such, there is no reason to sample a subset of the buffer.
There is no simple way to sample episodes without replacement from the buffer. As long as no algorithm expressly requires to subset entire Episodes, I don't see a compelling reason to add this functionality.
This PR adds a new sampler that will sample all the episodes in the Trajectory, even the truncated one (the first episode being typically the only one that's truncated in a limited size buffer).
The decision to sample all episodes is due to two reasons: