Closed sai-prasanna closed 2 years ago
This is only used when is_first
is False
at the beginning of the training batch. By default, it's always True
so the world model resets its hidden state (in the RSSM class). But this implementation could also support training with truncated backprop through time on longer sequences than can be fit into memory at the same time.
From my understanding, the posterior of the last timestep from a batch is used as the start state for the next batch. Is this intended? If so, is it just to avoid always initializing the start state to zeros and have it model some random sample from the current latent distribution?
https://github.com/danijar/dreamerv2/blob/07d906e9c4322c6fc2cd6ed23e247ccd6b7c8c41/dreamerv2/agent.py#L60