Closed bseveren closed 5 months ago
Hi, good observation. I'm using the bigger models on Atari100k in the updated paper. But there is a limit at which, when the task is really simple and the model really large, increasing model size further does not help. Then, it can make sense to use a smaller model for speed up wall clock time and thus iterate faster.
Hi, First of all thank you for this magnificent work!
We're doing tests with the dreamerv3 repo in a low-data regime (with human feedback). The scaling laws for low-data are a bit confusing to me currently. You added these graphs showing that bigger models do better when tens of millions of frames are available (but unclear for the 400K frame case):
Surprisingly you chose small models for the low-data regime benchmarks:
Did you test e.g. Atari100K with bigger models? If yes, are bigger models worse (or just not better)? If you did not test them with bigger models, what was the reason (e.g. related to the theory of Deep double descent)?