danijar / dreamerv3

Mastering Diverse Domains through World Models
https://danijar.com/dreamerv3
MIT License
1.28k stars 219 forks source link

How to optimize trade-offs in Scaling Laws #95

Closed bseveren closed 5 months ago

bseveren commented 11 months ago

It is impressive to see that scaling laws actually seem to work in RL now. This work seems to have made the balance between scaling CNN, GRU, MLPs based on gut feeling. Is that assumption correct? If not, can you give some more insights in what kind of trade-offs you tested and if it is possible that the system can be improved significantly just by applying other trade-offs in sizing the models?

Do you expect work coming out similar to Scaling Laws for Neural Language Models specifically for RL in order to be able to make smart trade-offs?

(I do expect it to be more challenging than the language domain, since 1. testing is more expensive due to a multitude of benchmarks compared to the single benchmark paradigm of just predicting the next word on a lot of text and 2. there is less consensus on the architecture to be used, so the results of the paper would probably be soon outdated)

danijar commented 5 months ago

The updated paper has slightly more optimized architectures now. There is definitely still room for improvement, e.g. you can further improve performance if you adjust certain hyperparameters like LR together with the model size. One difference to NLP is that RL often trains individual agents for different tasks, so there could be separate scaling laws (with most coefficients shared) for separate tasks, or one could study this in the setting of a single multi-task agent.