danijar / dreamerv2

Mastering Atari with Discrete World Models
https://danijar.com/dreamerv2
MIT License
886 stars 195 forks source link

Should policy state be reset after every episode? #36

Closed edwhu closed 2 years ago

edwhu commented 2 years ago

It seems like the state of the agent (self._state) is not initialized to 0 on reset. Only in the very first episode, it is None, so it will be set to 0s. Since driver.reset() is never called again in api.py, self._state will be carried over from previous episodes on episode reset. Is this intentional?

https://github.com/danijar/dreamerv2/blob/07d906e9c4322c6fc2cd6ed23e247ccd6b7c8c41/dreamerv2/common/driver.py#L32-L40

danijar commented 2 years ago

Yep, the world model resets its state based on the is_first flag: https://github.com/danijar/dreamerv2/blob/07d906e9c4322c6fc2cd6ed23e247ccd6b7c8c41/dreamerv2/common/nets.py#L94