google-deepmind / envlogger

A tool for recording RL trajectories.
Apache License 2.0
93 stars 13 forks source link

Logger is saving a shifted action for the given observation #13

Closed HM102 closed 2 months ago

HM102 commented 2 months ago

It seems like your logger is logging a shifted action for each observation. i.e for obs_i, you are grouping with it action_(i+1) instead of action_i.

In the Class Episode, the self.steps is always initialized with the action from the next state.

Here how the current code works:

  1. for (obs_0,action_0) the Episode is created with prev_step = action_0, but never saved in the self.steps.
  2. for the next action_1, the add_step is called and then the first element saved in the self.steps isobs_0 & action_1 . ( step is saved in action as in here and prev_stepis saved in obs as in here
  3. What you end up with saving is a sequence (obs_i,action_i+1) with i starting form 0. essentially saving a shifted action for each observation, and action_0 never being saved

One solution would be is to save prev_step instead of step in here 'action': prev_step.action if step else tf.nest.map_structure(...

sabelaraga commented 2 months ago

Hey, this is working as intended. We store information using the RLDS format, see details here https://github.com/google-research/rlds. The pair (obs_i, action_i) represent the action taken to reach obs_i. At observation 0, we are in the initial position and no action was taken to reach that state.

Let us know if we misunderstood the question.

Thanks!

HM102 commented 2 months ago

Thank you for the info. If I am not mistaken:

The envlogger takes a_0and outputs o_1. and the RLDS saves/loads it as(o_0, a_0).

The pair (obs_i, action_i) represent the action taken to reach obs_i. I think you meant action_i is the action taken to reach obs_i+1, or action_i is taken on obs_i.

please correct me if I am wrong.