Farama-Foundation / D4RL

A collection of reference environments for offline reinforcement learning
Apache License 2.0
1.33k stars 286 forks source link

[Bug Report] `terminate_on_end` does not behave as documented #182

Open odelalleau opened 2 years ago

odelalleau commented 2 years ago

Describe the bug The docstring for qlearning_dataset() says:

        terminate_on_end (bool): Set done=True on the last timestep
            in a trajectory.

However, if you look at the code, it does not actually set done=True.

Code example

env = gym.make("maze2d-open-dense-v0")
dataset = d4rl.qlearning_dataset(env, terminate_on_end=True)
assert dataset['terminals'].sum() == 0  # but we would expect it to be >0

System Info Installed d4rl from pip on Linux.

Checklist

(there is a potentially related issue but it seems different to me: #145)

idigitopia commented 2 years ago

It is working as expected in my understanding.

The maze2d domain is designed such that the ball is "supposed to" stick around the end region. That means the episode does not terminate when we reach the goal rather it will wait for max time step signal. Since end of an episode through max_time_step reaching is not "terimnal" state, This should be the desired behavior.

Here is a counter example where the episode does end on reaching the goal .

image

odelalleau commented 2 years ago

If it's working as expected then probably it means it's the documentation that needs to be fixed?

The doc explicitly says it sets "done=True" on the final timestep of a trajectory, and it is not the case. When looking at the implementation, in the codepath where final_timestep is True, the done flag is not modified when terminate_on_end is True, which contradicts the doc.