Open etaoxing opened 1 year ago
To be safe, I'd use 0..49. I think the 50th checkpoint do not exist for all the games as it might be a Dopamine artifact. Checkpoint 0 stores the first 1M steps (=4M frames), ckpt 1 stores the next 4M frames and so on ..
That said, if you are doing apples to apples comparison to SGI, maybe doing they did would make more sense.
The other option (which I also use these days) is to use the tfds
dataset version, which I believe also has 0..49: https://colab.sandbox.google.com/github/google-research/rlds/blob/main/rlds/examples/tfds_rlu_atari.ipynb
For some starter code, please see the supplementary material for ICLR'23 paper on scaled Q-learning.
Out of curiosity, @agarwl, is the 50th checkpoint the last 1M steps for those games for which it exists?
I think I'd just use the buffer 49 still (unless all the experiments are using buffer 50).
Are the 50 checkpoints indexed 0...49 or 1...50?
The following games are missing
gs://atari-replay-datasets/dqn/${g}/1/replay_logs/FILE.50.gz
check_c50.sh
: