Closed marintoro closed 6 years ago
I got Pong working after 1.0, but yes I recently observed this drop in Space Invaders. I don't think the epsilon greedy value is the problem (I think that was needed for getting correct scores on Pong, because with too much noise the agent would miss the ball and lose points). The other changes between 1.0 and master are a) adding gradient clipping b) using the prioritised loss for the prioritised weights. I'm going to try Space Invaders and disable gradient clipping, so if you're able to revert b) and run with default settings (no QR) then that would help find the problem.
Hum... I did a sanity check with your release v1.0 this week-end (commit 952fcb4). The performance I got are really different from the one you are showing. I opened a new issue there https://github.com/Kaixhin/Rainbow/issues/26
Closing this as I don't know what performance changing to QR in Rainbow should cause, but leaving the other issue open until the issue there is resolved.
Hello,
I wanted to make a sanity check of your code with QR prioritization (commit cf4c315) on Space Invaders. I only did 20 millions step but the performance are way lower than expected. My torch version is '0.4.0' and my atari_py version is '0.1.1'... Here are the reward and the Q values for this training (I barely reach 3000 after 25 millions iterations).
I will now launch a sanity check on Space Invaders with those versions of pytorch and atari_py of your release v1.0 (i.e. commit 952fcb4) I am doing this cause I got a multi-agent version of Rainbow, but it reaches only around 4000 score on space invaders after 50Millions iterations and with 4 agents. But the most likely is that I still got bugs in my multi-agent version of Rainbow...