araffin / rl-tutorial-jnrr19

Stable-Baselines tutorial for Journées Nationales de la Recherche en Robotique 2019
MIT License
591 stars 113 forks source link

2_gym_wrappers_saving_loading.ipynb: TimeFeatureWrapper ignores the case of not using TimeLimit wrapper #5

Closed rbahumi closed 4 years ago

rbahumi commented 4 years ago

Hi, Sorry for not making a proper pull request but I think it is better this way in the case of ipython notebooks.

The issue, in the case of not using TimeLimit wrapper there is no enforcement of maximum length. This can lead to a negative timestamp. Suggested fix:

class TimeFeatureWrapper(gym.Wrapper):
...
    def step(self, action):
        self._current_step += 1
        obs, reward, done, info = self.env.step(action)

        # Overwrite the done signal when the environment is not wrapped with a TimeLimit object
        if not isinstance(env, TimeLimit) and self._current_step >= self._max_steps:
          done = True
          # Update the info dict to signal that the limit was exceeded
          info['time_limit_reached'] = True    

        return self._get_obs(obs), reward, done, info

In addition, and maybe I'm wrong here, but I think that the time calculation should be the other war around, and the first episode should have a timestamp of 0 and the last a timestamp of 1:

time_feature = 1 - (self._current_step / self._max_steps)  =>  time_feature = 0.0 + (self._current_step / self._max_steps)
araffin commented 4 years ago

TimeLimit wrapper there is no enforcement of maximum length.This can lead to a negative timestamp.

True, however, I don't think that's a problem. Also, we should not change the episode length if the user is not aware of that. The TimeFeatureWrapper is only meant to add a feature, not to change the episode length.

the first episode should have a timestamp of 0 and the last a timestamp of 1:

It does not really matter, I did that to be consistent with this paper

rbahumi commented 4 years ago

Also, we should not change the episode length if the user is not aware of that.

Agree. Thanks for the quick replay.