The script above returns the state sequence [[21], [2]]. These states do not belong to the same trajectory. The 21 comes from the third (unfinished) trajectory that was fed into the buffer and the 2 comes from the first trajectory.
Expected Results
The state sequence [[11],[12]] should be returned every time.
The first four add() commands add two complete trajectories to the replay buffer: 1 --> 2 --> 3 and 11 --> 12 --> 13. Since the buffer has a capacity of 4, the fifth add() overwrites the first transition of the first trajectory. Hence, the only valid and complete trajectory in the buffer is 11 --> 12 --> 13.
Thanks for reporting! Created #96 to close this, added your case as a test case and a few additional checks. Let me know if you have any other suggestions to test for this.
Steps to reproduce
Observed Results
The script above returns the state sequence
[[21], [2]]
. These states do not belong to the same trajectory. The 21 comes from the third (unfinished) trajectory that was fed into the buffer and the 2 comes from the first trajectory.Expected Results
The state sequence
[[11],[12]]
should be returned every time.The first four
add()
commands add two complete trajectories to the replay buffer: 1 --> 2 --> 3 and 11 --> 12 --> 13. Since the buffer has a capacity of 4, the fifthadd()
overwrites the first transition of the first trajectory. Hence, the only valid and complete trajectory in the buffer is 11 --> 12 --> 13.