facebookresearch / moolib

A library for distributed ML training with PyTorch
MIT License
366 stars 20 forks source link

Atari median human-normalized score #30

Open zhongwen opened 2 years ago

zhongwen commented 2 years ago

Hi devs,

thanks a lot for the great library. It's been observed that moolib improves quite significantly over torchbeast. Great!

May I know if you have or you could generate the aggregated median human-normalized score curve of all the tested games? I'm wondering how it compares to the original IMPALA paper. Also I'm wondering what makes the improvements, do you have intuitions over it? moolib improves the distributed communication but I don't think it is directly related to dramatic reward improvements. It would be great if you can help me to understand more on it. thanks in advance!

heiner commented 2 years ago

Hey Zhongwen Xu,

Thanks for your interest in moolib and your kind words!

I don't have the human-normalized scores at hand, but I uploaded our logs which should enable you to generate the data you are looking for. If you do, let me know which steps you took (I'm not currently sure where to get human performance data for Atari from?).

Thanks!

zhongwen commented 2 years ago

Hi Heinrich,

Thanks a lot for your logs, they are very useful.

the human-normalized score function we use is from dqn_zoo here

with the help of dqn_zoo's notebook, we plot moolib learning curve as follows

SeaTalk_IMG_1644919971

It looks a bit lower than what we expected from IMPALA with ResNet... not sure why though

heiner commented 2 years ago

Hey Zhongwen Xu,

Thanks for this, that's very useful!

I agree this isn't quite what we'd like. I also do not know why it's not better, and even decidedly worse for some levels than TorchBeast or the original TensorFlow IMPALA code. It's not impossible we made some subtle (or not so subtle) mistake somewhere, it would certainly not be the first time. However, we checked all places carefully and found nothing suspicious.

Let us know if anything pops up for you.