Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.59k stars 284 forks source link

Montezuma's revenge - has this been tried using this codebase? #85

Open sunchipsster1 opened 2 years ago

sunchipsster1 commented 2 years ago

Hello! And thank you so much for this wonderful resource :) :)

I am currently working on montezuma's revenge, and have been trying to use your awesome codebase to better understand baselines that have been reported to work for montezuma's (e.g. Rainbow). I really enjoy your codebase because it is written in Pytorch rather than tensorflow or jax.

However, I have been unable to reproduce the reported result in the paper that Montezuma's should learn > 400 reward on rainbow, as I have not been able to get > 0 reward at all for any seeds.

I have been running: python -u main.py --replay-frequency 1 --architecture canonical --game montezuma_revenge --reward-clip 1 --max-episode-length 1000000 --replay-frequency 16 --target-update int(3.2e4) --learn-start int(100e3)

Have you gotten rainbow to work on Montezuma (get > 0 reward), and what hyperparameters did you use? Thank you so much in advance for your kind help! :)

Kaixhin commented 2 years ago

Back when I did release v1.3, as stated, I was unable to achieve any reward on Montezuma's Revenge (the only other result I couldn't match was on H.E.R.O.). However, there were a few changes to the codebase since, which hopefully might allow learning to happen.

I noticed that you are running with several hyperparameters that are different to the original paper. All you should need is python --game montezuma_revenge (with different seeds). So I would recommend trying that with a few seeds.