Lifelong-Robot-Learning / LIBERO

Benchmarking Knowledge Transfer in Lifelong Robot Learning
MIT License
171 stars 29 forks source link

Issues with Playback #16

Open dibyaghosh opened 4 months ago

dibyaghosh commented 4 months ago

I'm having a hard time reproducing (by playback) trajectories in the dataset (I'm interested in generating / logging some more details) -- they mostly look correct, but in some cases, the replayed trajectory "fails" while the original logged trajectory succeeds

I've been trying to follow scripts/create_dataset.py to replicate exactly how the files should be replayed, but I've been finding that almost always, the new states are at least 0.01 apart from those logged, and for a nontrivial number of states, it goes up >=1

I'm wondering

1) if there could be some dependency on a specific Mujoco version that was used to collect the data or 2) there could be the need for an off-by-one or something like that when replaying? 3) if there's some burn-in no-actions that need to be taken at the beginning of an episode (to warm up the controller)?

    states = f["data/{}/states".format(ep)][()]
    actions = np.array(f["data/{}/actions".format(ep)][()])
    init_idx = 0

    env.reset_from_xml_string(model_xml)
    env.sim.reset()
    env.sim.set_state_from_flattened(states[init_idx])
    env.sim.forward()
    all_obs = []
    all_actions = []
    n_errors = 0
    for j, action in enumerate(data["data"][ep]["actions"]):
        obs, reward, done, info = env.step(action)
        if j < len(actions) - 1:
            # ensure that the actions deterministically lead to the same recorded states
            state_playback = env.sim.get_state().flatten()
            # assert(np.all(np.equal(states[j + 1], state_playback)))
            err = np.linalg.norm(states[j + 1] - state_playback)
[warning] playback diverged by 0.24 for ep demo_4 at step 0
[warning] playback diverged by 1.29 for ep demo_4 at step 45
[warning] playback diverged by 0.22 for ep demo_4 at step 46
[warning] playback diverged by 1.02 for ep demo_4 at step 47
[warning] playback diverged by 2.04 for ep demo_4 at step 48
Cranial-XIX commented 3 months ago

The physics might be a bit different on different machines. If you want to replay data, you can directly reset to the sim state instead of replaying action sequences.

There is an initial burn-in no actions during evaluation with null actions, this is just for stabilizing the physics (e.g., sometimes the objects are not perfectly aligned, so null actions will let everything stabilize).