facebookresearch / mbrl-lib

Library for Model Based RL
MIT License
959 stars 158 forks source link

[Bug] ReplayBuffer.sample_trajectory() returns transitions from two different trajectories #95

Closed jan1854 closed 3 years ago

jan1854 commented 3 years ago

Steps to reproduce

import random

import numpy as np
from mbrl.util import ReplayBuffer

random.seed(0)
replay = ReplayBuffer(4, (1,), (1,), max_trajectory_length=2)

# Add the trajectory 1 --> 2 --> 3
replay.add(np.array([1]), np.array([1]), np.array([2]), 1, False)
replay.add(np.array([2]), np.array([1]), np.array([3]), 1, True)
# Add the trajectory 11 --> 12 --> 13
replay.add(np.array([11]), np.array([1]), np.array([12]), 1, False)
replay.add(np.array([12]), np.array([1]), np.array([13]), 1, True)
# Add the incomplete trajectory 21 --> 22
replay.add(np.array([21]), np.array([1]), np.array([22]), 1, False)

print(replay.sample_trajectory().obs)

Observed Results

The script above returns the state sequence [[21], [2]]. These states do not belong to the same trajectory. The 21 comes from the third (unfinished) trajectory that was fed into the buffer and the 2 comes from the first trajectory.

Expected Results

The state sequence [[11],[12]] should be returned every time.

The first four add() commands add two complete trajectories to the replay buffer: 1 --> 2 --> 3 and 11 --> 12 --> 13. Since the buffer has a capacity of 4, the fifth add() overwrites the first transition of the first trajectory. Hence, the only valid and complete trajectory in the buffer is 11 --> 12 --> 13.

luisenp commented 3 years ago

Thanks for reporting! Created #96 to close this, added your case as a test case and a few additional checks. Let me know if you have any other suggestions to test for this.

jan1854 commented 3 years ago

Great, thanks!