Closed HM102 closed 2 months ago
Hey, this is working as intended. We store information using the RLDS format, see details here https://github.com/google-research/rlds. The pair (obs_i, action_i) represent the action taken to reach obs_i. At observation 0, we are in the initial position and no action was taken to reach that state.
Let us know if we misunderstood the question.
Thanks!
Thank you for the info. If I am not mistaken:
The envlogger takes a_0
and outputs o_1
. and the RLDS saves/loads it as(o_0, a_0)
.
The pair (obs_i, action_i) represent the action taken to reach obs_i.
I think you meant action_i
is the action taken to reach obs_i+1
, or action_i
is taken on obs_i
.
please correct me if I am wrong.
It seems like your logger is logging a shifted action for each observation. i.e for
obs_i
, you are grouping with itaction_(i+1)
instead ofaction_i
.In the Class Episode, the
self.steps
is always initialized with the action from the next state.Here how the current code works:
(obs_0,action_0)
the Episode is created withprev_step = action_0
, but never saved in theself.steps
.action_1
, the add_step is called and then the first element saved in theself.steps
isobs_0
&action_1
. (step
is saved inaction
as in here andprev_step
is saved inobs
as in hereobs_i
,action_i+1
) withi
starting form0
. essentially saving a shifted action for each observation, andaction_0
never being savedOne solution would be is to save prev_step instead of step in here
'action':
prev_step.action if step else tf.nest.map_structure(...