Open zhongwen opened 2 years ago
Hey Zhongwen Xu,
Thanks for your interest in moolib and your kind words!
I don't have the human-normalized scores at hand, but I uploaded our logs which should enable you to generate the data you are looking for. If you do, let me know which steps you took (I'm not currently sure where to get human performance data for Atari from?).
Thanks!
Hey Zhongwen Xu,
Thanks for this, that's very useful!
I agree this isn't quite what we'd like. I also do not know why it's not better, and even decidedly worse for some levels than TorchBeast or the original TensorFlow IMPALA code. It's not impossible we made some subtle (or not so subtle) mistake somewhere, it would certainly not be the first time. However, we checked all places carefully and found nothing suspicious.
Let us know if anything pops up for you.
Hi devs,
thanks a lot for the great library. It's been observed that moolib improves quite significantly over torchbeast. Great!
May I know if you have or you could generate the aggregated median human-normalized score curve of all the tested games? I'm wondering how it compares to the original IMPALA paper. Also I'm wondering what makes the improvements, do you have intuitions over it? moolib improves the distributed communication but I don't think it is directly related to dramatic reward improvements. It would be great if you can help me to understand more on it. thanks in advance!