google-research / batch_rl

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
https://offline-rl.github.io/
Apache License 2.0
528 stars 74 forks source link

Asterix/1 dataset broken? #28

Open dssrgu opened 2 years ago

dssrgu commented 2 years ago

Hi,

I tried reproducing the offline REM results with Asterix/1 dataset by using the command below:

python -um batch_rl.fixed_replay.train \
  --base_dir=/tmp/batch_rl \
  --replay_dir=/data_large/readonly/atari/Asterix/1 \
  --agent_name=multi_head_dqn \
  --gin_files='batch_rl/fixed_replay/configs/rem.gin' \
  --gin_bindings='FixedReplayRunner.num_iterations=1000' \
  --gin_bindings='atari_lib.create_atari_environment.game_name = "Asterix"'

But could not reproduce the results (about avg 50 return on 200th iteration). Meanwhile, I can reproduce the results in other Asterix datasets (e.g. Asterix/2, ...). Could you check if the Asterix/1 dataset has some errors?

Thank you!

agarwl commented 2 years ago

Hmm .. I'm also not sure what could be causing the issue but we have used Asterix/1 for some of our recent ICLR/NeurIPS papers and it does seem to replicate internally -- That said, I do know CQL authors had difficulty replicating the results on Asterix, so there might be some chance the dataset might be corrupted. If this is time sensitive, feel free to ignore this specific run. I'll also compare the checksums of the public data and the internal data.

One thing that you can check is whether the clipped rewards match in the dataset with the TFDS version of the dataset (see this colab for an example of how to load the dataset).

dssrgu commented 2 years ago

Thank you for the response!

I'll ignore this dataset for right now. Just to let you know, below is the training curve for Asterix/1 and Asterix/2:

Screen Shot 2022-04-17 at 6 53 51 PM

Thank you!

agarwl commented 1 year ago

This is probably too late but I re-uploaded the entire dataset for run 1 from the copy we have internally. Hopefully, this would fix the issue.