How to optimize trade-offs in Scaling Laws

danijar / dreamerv3

Mastering Diverse Domains through World Models

MIT License

1.28k stars 219 forks source link

It is impressive to see that scaling laws actually seem to work in RL now. This work seems to have made the balance between scaling CNN, GRU, MLPs based on gut feeling. Is that assumption correct? If not, can you give some more insights in what kind of trade-offs you tested and if it is possible that the system can be improved significantly just by applying other trade-offs in sizing the models?

Do you expect work coming out similar to Scaling Laws for Neural Language Models specifically for RL in order to be able to make smart trade-offs?

(I do expect it to be more challenging than the language domain, since 1. testing is more expensive due to a multitude of benchmarks compared to the single benchmark paradigm of just predicting the next word on a lot of text and 2. there is less consensus on the architecture to be used, so the results of the paper would probably be soon outdated)

danijar / dreamerv3

How to optimize trade-offs in Scaling Laws #95