eloialonso / diamond

DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
https://diamond-wm.github.io
MIT License
1.53k stars 101 forks source link

Reproducing CSGO training config #29

Closed kxhit closed 2 days ago

kxhit commented 4 days ago

Hi thanks for this great work!

I'm using the CSGO branch to reproduce the results in the web demo.

In the paper, section J.3, "trained for 120k updates with a batch size of 64, on up to 4×A6000 GPUs. Each training run took between 1-2 days", while in the readme, "The provided configuration took 12 days on a RTX 4090.". Would you clarify the configurations for the demo checkpoint? And I assume the bs 64 is the global bs for 4 RX 4090 GPUs, no gradient accumulation.

Thank you!

AdamJelley commented 2 days ago

Hi @kxhit! The CSGO details in appendix M (previously J) are from early experiments with CSGO and do not correspond to the released demo model. The readme is correct that the released model was trained for 12 days on a RTX 4090. We provide our training code and config so you should be able to reproduce the training by following the training instructions using the default trainer config.

kxhit commented 2 days ago

Hi @AdamJelley thanks for the prompt reply!

However, the provided config is OOM on A6000 or GTX 4090, could you double check if denoiser batch size 64, grad_acc 2 will be successfully trained on a 24GB GTX4090? Thank you!