Closed jmkim0309 closed 3 years ago
Hey Kim,
That is a very good question. Unfortunately, I don't have any good answer. The truth is that what makes RL work (or not) is a very delicate balance between hyperparameters, update frequencies, and randomness. The papers in the field sometimes have a good chunk of theory, but the actual implementations are only theory-inspired, and the real proof of their performance happens via testbeds like Atari.
This is certainly a worthwhile, if daunting, topic to get into for junior researchers. Currently, we don't understand this topic very well at all.
Hi @heiner,
Thank you for your kind reply for the previous issue (https://github.com/facebookresearch/torchbeast/issues/25). As I understand, I need to use polybeast to reproduce the SpaceInvaders results.
But could you please elaborate a little bit more on the following: "The MonoBeast version you are using has the upside of being simpler to install and run, but uses a different design that impacts RL performance in hard to understand ways".
I assume that polybeast enables much faster than monobeast, but what exactly is the reason of the score gap between those two? e.g. better exploration at the early stage of training, less policy lag during environment interaction ...