Training Actor-Critics: Full Observation vs Latent-State (tokens)

eloialonso / iris

Transformers are Sample-Efficient World Models. ICLR 2023, notable top 5%.

https://openreview.net/forum?id=vhFu1Acb0xb

GNU General Public License v3.0

804 stars 80 forks source link

Training Actor-Critics: Full Observation vs Latent-State (tokens) #15

Closed rudrapoudel closed 1 year ago

rudrapoudel commented 1 year ago

Nice work and thanks for the code!

Why did you decided to train the actor-critics on the FULL Original-Observation-Image/RGB or Imagined-Observation-Image/RGB rather than just simply using output of the transformer i.e. tokens?
Have you done any ablation study on the above matter?

vmicheli commented 1 year ago

Hey, thanks for the kind words.

We decided to focus on developing a world model architecture for learning in imagination. What you suggest has more to do with representation learning and would imply entangling the design of the world model with that of the policy. It would indeed be interesting to investigate architectures where the policy operates over latent states of the autoencoder and/or hidden states of the Transformer. At the moment, we have not run such experiments.