google-research / planet

Learning Latent Dynamics for Planning from Pixels
https://danijar.com/planet
Apache License 2.0
1.18k stars 202 forks source link

What if the observation is extracted features instead of images and has much smaller dimension than latent? #59

Open seheevic opened 4 years ago

seheevic commented 4 years ago

Hi! I'm not sure you still do Q&A support here :blush:, but I'm obsessed a certain problem beyond my math skills. I hope you could help me.

The question is related to the loss function of your RSSM which uses variational approach. The reconstruction loss of VAE is p(o_t|s_t) as it is decoder from latent to image. In this case, an observation(=image) has much bigger dimension than the latent. But when it comes to the case in which o_t has much smaller dimension (for example, 4 values like cartpole of OpenAI gym classic_control) than the latent(let's say this is 32~64 here), I think p(o_t|s_t) could not learn any meaningful distribution. Because the conditional s_t was sampled from variational posterior q(s_t|a_1:t, o_1:t) which already has seen the observation of current timestep o_t, I suspect that s_t could just learn to copy the full o_t inside s_t because the dimension of s_t is much bigger.

In this situation (non-image and small dimension of observation), can we still hold this VAE-like approach? Or is there some other technique more reasonable in this case? I hope this worry makes sense to you. :confused:

abrandenb commented 4 years ago

Since the Autoencoder is used for dimensionality reduction (in the default configs from 64x64x3=12288 dimensions down to around 500 dimensions), I would not apply it in the scenario you describe. If you have a low-dimensional input, you may skip the autoencoder, since it wouldn't give you any gain. I assume you can still learn the latent dynamics model, the reward model and then apply MPC, just like planet would do if you scrap the VAE.