Closed JingbinLiu closed 5 years ago
@JingbinLiu I would recommend future_rnn: True
-- we are planning to update the paper soon to add some discussion about this. The posterior method implements q(s_t | s_t-1, a_t-1, o_t) while the transition methods implements p(s_t | s_t-1, a_t-1). The posterior looks at the observation while the transition method should be used for steps where no observations are available, i.e. the future. Note that the posterior uses the transition method internally, so they share most of their parameters.
@astronautas The alternative to planning in latent space is to generate images for each future time step at a time and feed them back into the model to predict the next. This means a lot more computation than staying in the small latent space. Moreover, it wouldn't only allow to evaluate a small batch size of action sequences at the same while, while we can evaluate >1000 sequences at once.
@astronautas @danijar Many thanks for your comments.
planet/planet/models/rssm.py line 96: if self._future_rnn: hidden = belief Could you shed more light on the two schemes? For the case hidden = belief, the transition model seems intertwined with the posterior model. Which one is better?
planet/control/planning.py line 48: (, state), = tf.nn.dynamic_rnn( cell, (0 * obs, action, use_obs), initial_state=initial_state) Here the posterior is used for roll-out, but the observation is not provided for planning. I wonder if we could use the transition model to do roll-out. What's the difference?