Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Performance with QR prioritization on Space Invaders #25

Closed marintoro closed 6 years ago

marintoro commented 6 years ago

Hello,

I wanted to make a sanity check of your code with QR prioritization (commit cf4c315) on Space Invaders. I only did 20 millions step but the performance are way lower than expected. My torch version is '0.4.0' and my atari_py version is '0.1.1'... Here are the reward and the Q values for this training (I barely reach 3000 after 25 millions iterations).

reward_rainbow_qr_prioritization q_values_rainbow_qr_prioritization png

I will now launch a sanity check on Space Invaders with those versions of pytorch and atari_py of your release v1.0 (i.e. commit 952fcb4) I am doing this cause I got a multi-agent version of Rainbow, but it reaches only around 4000 score on space invaders after 50Millions iterations and with 4 agents. But the most likely is that I still got bugs in my multi-agent version of Rainbow...

Kaixhin commented 6 years ago

I got Pong working after 1.0, but yes I recently observed this drop in Space Invaders. I don't think the epsilon greedy value is the problem (I think that was needed for getting correct scores on Pong, because with too much noise the agent would miss the ball and lose points). The other changes between 1.0 and master are a) adding gradient clipping b) using the prioritised loss for the prioritised weights. I'm going to try Space Invaders and disable gradient clipping, so if you're able to revert b) and run with default settings (no QR) then that would help find the problem.

marintoro commented 6 years ago

Hum... I did a sanity check with your release v1.0 this week-end (commit 952fcb4). The performance I got are really different from the one you are showing. I opened a new issue there https://github.com/Kaixhin/Rainbow/issues/26

Kaixhin commented 6 years ago

Closing this as I don't know what performance changing to QR in Rainbow should cause, but leaving the other issue open until the issue there is resolved.