Closed Harimus closed 2 years ago
Trajectories returned are off-by-one in the indexing. Essentially, the tuple that is supposed to be (s_t, a_t, rt) is collected as (s{t+1}, a_t, r_t) with s_0 unrecoverable. This commit fixes it.
This does not affect any existing result, as the return_trajectories=True are not used (or the data collected used)
return_trajectories=True
Trajectories returned are off-by-one in the indexing. Essentially, the tuple that is supposed to be (s_t, a_t, rt) is collected as (s{t+1}, a_t, r_t) with s_0 unrecoverable. This commit fixes it.