Montezuma's revenge - has this been tried using this codebase?

Hello! And thank you so much for this wonderful resource :) :)

I am currently working on montezuma's revenge, and have been trying to use your awesome codebase to better understand baselines that have been reported to work for montezuma's (e.g. Rainbow). I really enjoy your codebase because it is written in Pytorch rather than tensorflow or jax.

However, I have been unable to reproduce the reported result in the paper that Montezuma's should learn > 400 reward on rainbow, as I have not been able to get > 0 reward at all for any seeds.

I have been running: python -u main.py --replay-frequency 1 --architecture canonical --game montezuma_revenge --reward-clip 1 --max-episode-length 1000000 --replay-frequency 16 --target-update int(3.2e4) --learn-start int(100e3)

Have you gotten rainbow to work on Montezuma (get > 0 reward), and what hyperparameters did you use? Thank you so much in advance for your kind help! :)

Kaixhin / Rainbow

Montezuma's revenge - has this been tried using this codebase? #85