Why share states across random batches for training the world model?

danijar / dreamerv2

Mastering Atari with Discrete World Models

https://danijar.com/dreamerv2

MIT License

898 stars 195 forks source link

Why share states across random batches for training the world model? #44

Closed sai-prasanna closed 2 years ago

sai-prasanna commented 2 years ago

From my understanding, the posterior of the last timestep from a batch is used as the start state for the next batch. Is this intended? If so, is it just to avoid always initializing the start state to zeros and have it model some random sample from the current latent distribution?

https://github.com/danijar/dreamerv2/blob/07d906e9c4322c6fc2cd6ed23e247ccd6b7c8c41/dreamerv2/agent.py#L60

danijar commented 2 years ago

This is only used when is_first is False at the beginning of the training batch. By default, it's always True so the world model resets its hidden state (in the RSSM class). But this implementation could also support training with truncated backprop through time on longer sequences than can be fit into memory at the same time.