Hotfix. evaluate_agent() with return_trajectories=True right now return wrongful trajectories

Kaixhin / imitation-learning

Imitation learning algorithms

MIT License

408 stars 39 forks source link

Hotfix. evaluate_agent() with return_trajectories=True right now return wrongful trajectories #9

Closed Harimus closed 2 years ago

Harimus commented 2 years ago

Trajectories returned are off-by-one in the indexing. Essentially, the tuple that is supposed to be (s_t, a_t, rt) is collected as (s{t+1}, a_t, r_t) with s_0 unrecoverable. This commit fixes it.

Harimus commented 2 years ago

This does not affect any existing result, as the return_trajectories=True are not used (or the data collected used)