google-research / planet

Learning Latent Dynamics for Planning from Pixels
https://danijar.com/planet
Apache License 2.0
1.18k stars 202 forks source link

Understanding the framework of planet #23

Closed JingbinLiu closed 5 years ago

JingbinLiu commented 5 years ago
  1. planet/planet/models/rssm.py line 96: if self._future_rnn: hidden = belief Could you shed more light on the two schemes? For the case hidden = belief, the transition model seems intertwined with the posterior model. Which one is better?

  2. planet/control/planning.py line 48: (, state), = tf.nn.dynamic_rnn( cell, (0 * obs, action, use_obs), initial_state=initial_state) Here the posterior is used for roll-out, but the observation is not provided for planning. I wonder if we could use the transition model to do roll-out. What's the difference?

astronautas commented 5 years ago
  1. It was stated in the paper that it's much faster to plan in the compressed latent space (@danijar could explain the reason more thoroughly).
danijar commented 5 years ago

@JingbinLiu I would recommend future_rnn: True -- we are planning to update the paper soon to add some discussion about this. The posterior method implements q(s_t | s_t-1, a_t-1, o_t) while the transition methods implements p(s_t | s_t-1, a_t-1). The posterior looks at the observation while the transition method should be used for steps where no observations are available, i.e. the future. Note that the posterior uses the transition method internally, so they share most of their parameters.

@astronautas The alternative to planning in latent space is to generate images for each future time step at a time and feed them back into the model to predict the next. This means a lot more computation than staying in the small latent space. Moreover, it wouldn't only allow to evaluate a small batch size of action sequences at the same while, while we can evaluate >1000 sequences at once.

JingbinLiu commented 5 years ago

@astronautas @danijar Many thanks for your comments.