eloialonso / iris

Transformers are Sample-Efficient World Models. ICLR 2023, notable top 5%.
https://openreview.net/forum?id=vhFu1Acb0xb
GNU General Public License v3.0
804 stars 80 forks source link
artificial-intelligence atari deep-learning machine-learning reinforcement-learning research transformers world-models

Transformers are Sample-Efficient World Models (IRIS)

Transformers are Sample-Efficient World Models
Vincent Micheli*, Eloi Alonso*, François Fleuret
* Denotes equal contribution

IRIS agent after 100k environment steps, i.e. two hours of real-time experience IRIS playing on Asterix, Boxing, Breakout, Demon Attack, Freeway, Gopher, Kung Fu Master, Pong

tl;dr

BibTeX

If you find this code or paper useful, please use the following reference:

@inproceedings{
  iris2023,
  title={Transformers are Sample-Efficient World Models},
  author={Vincent Micheli and Eloi Alonso and Fran{\c{c}}ois Fleuret},
  booktitle={The Eleventh International Conference on Learning Representations },
  year={2023},
  url={https://openreview.net/forum?id=vhFu1Acb0xb}
}

Setup

Launch a training run

python src/main.py env.train.id=BreakoutNoFrameskip-v4 common.device=cuda:0 wandb.mode=online

By default, the logs are synced to weights & biases, set wandb.mode=disabled to turn it off.

Configuration

Run folder

Each new run is located at outputs/YYYY-MM-DD/hh-mm-ss/. This folder is structured as:

outputs/YYYY-MM-DD/hh-mm-ss/
│
└─── checkpoints
│   │   last.pt
|   |   optimizer.pt
|   |   ...
│   │
│   └─── dataset
│       │   0.pt
│       │   1.pt
│       │   ...
│
└─── config
│   |   trainer.yaml
|
└─── media
│   │
│   └─── episodes
│   |   │   ...
│   │
│   └─── reconstructions
│   |   │   ...
│
└─── scripts
|   |   eval.py
│   │   play.sh
│   │   resume.sh
|   |   ...
|
└─── src
|   |   ...
|
└─── wandb
    |   ...

Results notebook

The folder results/data/ contains raw scores (for each game, and for each training run) for IRIS and the baselines.

Use the notebook results/results_iris.ipynb to reproduce the figures from the paper.

Pretrained models

Pretrained models are available here.

Credits