facebookresearch / torchbeast

A PyTorch Platform for Distributed RL
Apache License 2.0
734 stars 113 forks source link

How exactly monobeast and polybeast are different in performance perspective? #28

Closed jmkim0309 closed 3 years ago

jmkim0309 commented 3 years ago

Hi @heiner,

Thank you for your kind reply for the previous issue (https://github.com/facebookresearch/torchbeast/issues/25). As I understand, I need to use polybeast to reproduce the SpaceInvaders results.

But could you please elaborate a little bit more on the following: "The MonoBeast version you are using has the upside of being simpler to install and run, but uses a different design that impacts RL performance in hard to understand ways".

I assume that polybeast enables much faster than monobeast, but what exactly is the reason of the score gap between those two? e.g. better exploration at the early stage of training, less policy lag during environment interaction ...

heiner commented 3 years ago

Hey Kim,

That is a very good question. Unfortunately, I don't have any good answer. The truth is that what makes RL work (or not) is a very delicate balance between hyperparameters, update frequencies, and randomness. The papers in the field sometimes have a good chunk of theory, but the actual implementations are only theory-inspired, and the real proof of their performance happens via testbeds like Atari.

This is certainly a worthwhile, if daunting, topic to get into for junior researchers. Currently, we don't understand this topic very well at all.