google-research / batch_rl

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
https://offline-rl.github.io/
Apache License 2.0
520 stars 73 forks source link

Difference b/w checkpoint 49 and checkpoint 50 #33

Open etaoxing opened 1 year ago

etaoxing commented 1 year ago

Are the 50 checkpoints indexed 0...49 or 1...50?

The following games are missing gs://atari-replay-datasets/dqn/${g}/1/replay_logs/FILE.50.gz

$ ./check_c50.sh
Carnival missing ckpt50
Centipede missing ckpt50
IceHockey missing ckpt50
StarGunner missing ckpt50
VideoPinball missing ckpt50
YarsRevenge missing ckpt50

check_c50.sh:

games='AirRaid Alien Amidar Assault Asterix Asteroids Atlantis BankHeist BattleZone BeamRider Berzerk Bowling Boxing Breakout Carnival Centipede ChopperCommand CrazyClimber DemonAttack DoubleDunk ElevatorAction Enduro FishingDerby Freeway Frostbite Gopher Gravitar Hero IceHockey Jamesbond JourneyEscape Kangaroo Krull KungFuMaster MontezumaRevenge MsPacman NameThisGame Phoenix Pitfall Pong Pooyan PrivateEye Qbert Riverraid RoadRunner Robotank Seaquest Skiing Solaris SpaceInvaders StarGunner Tennis TimePilot Tutankham UpNDown Venture VideoPinball WizardOfWor YarsRevenge Zaxxon'

for g in ${games[@]}; do
  output=$(gsutil ls gs://atari-replay-datasets/dqn/${g}/1/replay_logs/)
  # echo -n "${g} "
  # echo -n "${output} " | wc -l
  if [ -z "$(echo ${output} | grep 50)" ] ; then echo "${g} missing ckpt50" ; fi
done;
etaoxing commented 1 year ago

Additional context:

DT uses 0...49, while SGI uses 1...50.

agarwl commented 1 year ago

To be safe, I'd use 0..49. I think the 50th checkpoint do not exist for all the games as it might be a Dopamine artifact. Checkpoint 0 stores the first 1M steps (=4M frames), ckpt 1 stores the next 4M frames and so on ..

That said, if you are doing apples to apples comparison to SGI, maybe doing they did would make more sense.

agarwl commented 1 year ago

The other option (which I also use these days) is to use the tfds dataset version, which I believe also has 0..49: https://colab.sandbox.google.com/github/google-research/rlds/blob/main/rlds/examples/tfds_rlu_atari.ipynb

For some starter code, please see the supplementary material for ICLR'23 paper on scaled Q-learning.

kaustubhsridhar commented 6 months ago

Out of curiosity, @agarwl, is the 50th checkpoint the last 1M steps for those games for which it exists?

agarwl commented 6 months ago

I think I'd just use the buffer 49 still (unless all the experiments are using buffer 50).