Closed PaulScemama closed 6 months ago
Hi @PaulScemama, thank you for using our library!
In every algorithm we always wrap our environments with a gymnasium.vector.SyncVectorEnv
or gymnasium.vector.AsyncVectorEnv
. As specified in gymnasium docs:
To prevent terminated environments waiting until all sub-environments have terminated or truncated, the vector environments autoreset sub-environments after they terminate or truncated. As a result, the final step’s observation and info are overwritten by the reset’s observation and info. Therefore, the observation and info for the final step of a sub-environment is stored in the info parameter, using “final_observation” and “final_info” respectively.
So we're always sure that we have the reset observation when an episode has terminated (or truncated). When we need the final observations we grab it from the info dictionary.
Thank you @belerico! That totally makes sense
Hi!
I had a question about how you handle environment resets.
I see that you only call an environment's reset method once. For example, in dreamer_v3.py, here.
In some environments, like
gymnasium
ones, the canonical usage isHere, any time
done is True
, an observation is queried fromreset()
to feed to the agent to determine an action for the beginning of a new episode.My question is: how do you attain this first observation from
reset()
any time we're beginning a new episode? I can't seem to find where this ever happens in your code. My hunch is that you instead use the last observation (the one accompanied withdone is True
) as the first observation of the next episode, effectively in place of the one attained viareset()
like in the code snippet I've provided above.Thanks so much, and by the way very nice library!