Farama-Foundation / D4RL

A collection of reference environments for offline reinforcement learning
Apache License 2.0
1.29k stars 278 forks source link

[Question] Reproducing qpos and qvel #172

Open msc5 opened 2 years ago

msc5 commented 2 years ago

Question

Hello everyone,

I'm doing some experiments using this dataset, and I'm finding a strange issue when trying to reproduce results, where if I roll out the dataset's sequence of actions on a "walker2d-expert-v2" environment, starting from the same initial state, I do not observe the dataset's qpos and qvel. I think my findings can be shown by this script, which shows the differences between the expected observations and the recorded ones:

env = gym.make('walker2d-expert-v2')

dataset = env.get_dataset()
ref_actions = dataset['actions']
ref_qpos, ref_qvel = dataset['infos/qpos'], dataset['infos/qvel']

obs = env.reset()
env.set_state(ref_qpos[0], ref_qvel[0])

done = False
step = 0

for i in range(100):

    out_qpos, out_qvel = env.sim.data.qpos, env.sim.data.qvel
    diff = np.abs(out_qpos - ref_qpos[i]) + np.abs(out_qvel - ref_qvel[i])
    diff = np.sum(diff)
    print(f'step: {i:3} diff: {diff:5.5f}')

    obs, reward, done, _ = env.step(ref_actions[i])

    step += 1

The output of this script yields the following values for me:

           step:   0 diff: 0.00000                                                    nb_d4rl.py:53
           step:   1 diff: 6.08603                                                    nb_d4rl.py:53
           step:   2 diff: 20.01421                                                   nb_d4rl.py:53
           step:   3 diff: 32.33157                                                   nb_d4rl.py:53
           step:   4 diff: 33.15942                                                   nb_d4rl.py:53
           step:   5 diff: 17.42948                                                   nb_d4rl.py:53
           step:   6 diff: 32.78226                                                   nb_d4rl.py:53
           step:   7 diff: 33.80180                                                   nb_d4rl.py:53
           step:   8 diff: 38.10448                                                   nb_d4rl.py:53
           step:   9 diff: 18.28416                                                   nb_d4rl.py:53
           step:  10 diff: 0.00059                                                    nb_d4rl.py:53
           step:  11 diff: 0.00059                                                    nb_d4rl.py:53
           step:  12 diff: 2.44090                                                    nb_d4rl.py:53
           step:  13 diff: 17.03609                                                   nb_d4rl.py:53
           step:  14 diff: 11.15495                                                   nb_d4rl.py:53
           step:  15 diff: 6.81992                                                    nb_d4rl.py:53
           step:  16 diff: 2.98914                                                    nb_d4rl.py:53
           step:  17 diff: 1.00799                                                    nb_d4rl.py:53
           step:  18 diff: 0.00121                                                    nb_d4rl.py:53
           step:  19 diff: 0.00121                                                    nb_d4rl.py:53
           step:  20 diff: 0.00121                                                    nb_d4rl.py:53
           step:  21 diff: 0.00121                                                    nb_d4rl.py:53
           step:  22 diff: 0.00121                                                    nb_d4rl.py:53
           step:  23 diff: 0.00121                                                    nb_d4rl.py:53
           step:  24 diff: 0.00121                                                    nb_d4rl.py:53
           step:  25 diff: 0.00121                                                    nb_d4rl.py:53
           step:  26 diff: 0.00120                                                    nb_d4rl.py:53
           step:  27 diff: 0.00121                                                    nb_d4rl.py:53
           step:  28 diff: 0.00121                                                    nb_d4rl.py:53
           step:  29 diff: 0.00121                                                    nb_d4rl.py:53
           step:  30 diff: 0.00124                                                    nb_d4rl.py:53
           step:  31 diff: 0.00122                                                    nb_d4rl.py:53
           step:  32 diff: 0.00121                                                    nb_d4rl.py:53
           step:  33 diff: 0.00120                                                    nb_d4rl.py:53
           step:  34 diff: 0.00120                                                    nb_d4rl.py:53
           step:  35 diff: 0.00120                                                    nb_d4rl.py:53
           step:  36 diff: 0.00120                                                    nb_d4rl.py:53
           step:  37 diff: 0.00120                                                    nb_d4rl.py:53
           step:  38 diff: 0.00120                                                    nb_d4rl.py:53
           step:  39 diff: 0.00120                                                    nb_d4rl.py:53
           step:  40 diff: 0.00120                                                    nb_d4rl.py:53
           step:  41 diff: 0.00120                                                    nb_d4rl.py:53
           step:  42 diff: 0.00120                                                    nb_d4rl.py:53
           step:  43 diff: 0.00120                                                    nb_d4rl.py:53
           step:  44 diff: 0.00120                                                    nb_d4rl.py:53
           step:  45 diff: 0.00120                                                    nb_d4rl.py:53
           step:  46 diff: 0.00120                                                    nb_d4rl.py:53
           step:  47 diff: 0.00120                                                    nb_d4rl.py:53
           step:  48 diff: 0.00120                                                    nb_d4rl.py:53
           step:  49 diff: 0.00120                                                    nb_d4rl.py:53
           step:  50 diff: 0.00121                                                    nb_d4rl.py:53
           step:  51 diff: 0.00121                                                    nb_d4rl.py:53
           step:  52 diff: 0.00121                                                    nb_d4rl.py:53
           step:  53 diff: 0.00121                                                    nb_d4rl.py:53
           step:  54 diff: 0.00122                                                    nb_d4rl.py:53
           step:  55 diff: 0.00121                                                    nb_d4rl.py:53
           step:  56 diff: 0.00121                                                    nb_d4rl.py:53
           step:  57 diff: 0.00121                                                    nb_d4rl.py:53
           step:  58 diff: 0.00121                                                    nb_d4rl.py:53
           step:  59 diff: 0.00121                                                    nb_d4rl.py:53
           step:  60 diff: 0.00121                                                    nb_d4rl.py:53
           step:  61 diff: 0.00121                                                    nb_d4rl.py:53
           step:  62 diff: 0.00121                                                    nb_d4rl.py:53
           step:  63 diff: 0.00121                                                    nb_d4rl.py:53
           step:  64 diff: 0.00124                                                    nb_d4rl.py:53
           step:  65 diff: 0.01471                                                    nb_d4rl.py:53
           step:  66 diff: 0.00124                                                    nb_d4rl.py:53
           step:  67 diff: 0.00123                                                    nb_d4rl.py:53
           step:  68 diff: 0.00123                                                    nb_d4rl.py:53
           step:  69 diff: 0.00124                                                    nb_d4rl.py:53
           step:  70 diff: 0.00124                                                    nb_d4rl.py:53
           step:  71 diff: 0.00124                                                    nb_d4rl.py:53
           step:  72 diff: 0.00124                                                    nb_d4rl.py:53
           step:  73 diff: 0.00125                                                    nb_d4rl.py:53
           step:  74 diff: 0.00125                                                    nb_d4rl.py:53
           step:  75 diff: 0.00125                                                    nb_d4rl.py:53
           step:  76 diff: 0.00125                                                    nb_d4rl.py:53
           step:  77 diff: 0.00126                                                    nb_d4rl.py:53
           step:  78 diff: 0.00126                                                    nb_d4rl.py:53
           step:  79 diff: 0.00127                                                    nb_d4rl.py:53
           step:  80 diff: 0.00128                                                    nb_d4rl.py:53
           step:  81 diff: 0.86400                                                    nb_d4rl.py:53
           step:  82 diff: 1.59836                                                    nb_d4rl.py:53
           step:  83 diff: 2.42107                                                    nb_d4rl.py:53
           step:  84 diff: 3.83615                                                    nb_d4rl.py:53
           step:  85 diff: 4.92854                                                    nb_d4rl.py:53
           step:  86 diff: 11.50469                                                   nb_d4rl.py:53
           step:  87 diff: 19.96653                                                   nb_d4rl.py:53
           step:  88 diff: 2.43489                                                    nb_d4rl.py:53
           step:  89 diff: 0.00166                                                    nb_d4rl.py:53
           step:  90 diff: 0.00168                                                    nb_d4rl.py:53
           step:  91 diff: 0.00167                                                    nb_d4rl.py:53
           step:  92 diff: 0.00166                                                    nb_d4rl.py:53
           step:  93 diff: 0.00164                                                    nb_d4rl.py:53
           step:  94 diff: 0.00163                                                    nb_d4rl.py:53
           step:  95 diff: 0.00162                                                    nb_d4rl.py:53
           step:  96 diff: 0.00161                                                    nb_d4rl.py:53
           step:  97 diff: 7.24985                                                    nb_d4rl.py:53
           step:  98 diff: 3.79157                                                    nb_d4rl.py:53
           step:  99 diff: 0.16677                                                    nb_d4rl.py:53

Here, my expectation is that all "diff" values should be zero, if I am using the dataset properly.

Any advice is welcome!

im-Kitsch commented 1 month ago

Hi, @msc5

I also have same problem, have you solved it?

msc5 commented 1 month ago

Sorry, I was not able to solve it.