Open odelalleau opened 2 years ago
It is working as expected in my understanding.
The maze2d domain is designed such that the ball is "supposed to" stick around the end region. That means the episode does not terminate when we reach the goal rather it will wait for max time step signal. Since end of an episode through max_time_step reaching is not "terimnal" state, This should be the desired behavior.
Here is a counter example where the episode does end on reaching the goal .
If it's working as expected then probably it means it's the documentation that needs to be fixed?
The doc explicitly says it sets "done=True" on the final timestep of a trajectory, and it is not the case. When looking at the implementation, in the codepath where final_timestep
is True, the done
flag is not modified when terminate_on_end
is True, which contradicts the doc.
Describe the bug The docstring for
qlearning_dataset()
says:However, if you look at the code, it does not actually set
done=True
.Code example
System Info Installed d4rl from pip on Linux.
Checklist
(there is a potentially related issue but it seems different to me: #145)