alexcbb / Genie-Generative-Interactive-Environments

This repo aims to reproduce and open the results obtained from "Generative Interactive Environments" of Google DeepMind.

MIT License

6 stars 2 forks source link

[Feature] Mask-GiT #3

Open alexcbb opened 8 months ago

alexcbb commented 8 months ago

Feature details

The dynamics of the model are handled by a decoder-only Mask-GiT. Given a tokenized video (from VQ-VAE) and a latent action (from latent action model), it predicts the next frame.

What needs to be done

[ ] Requires #1 to be done
[ ] Use #1 to create the structure of the model
[ ] Evaluate the model on toy scenarios
[ ] Prepare unit tests
[ ] PR the new feature