Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Unit test Prioritised Experience Replay Memory #16

Closed Kaixhin closed 6 years ago

Kaixhin commented 6 years ago

PER was reported to cause issues (decreasing the performance of a DQN) when ported to another codebase. Although PER can cause performance to decrease, it is still likely that there exists a bug within it.

Ashutosh-Adhikari commented 6 years ago

I am not sure whether what I am going to say is the correct logic behind PER or not.

What current code does : In the training loop, when we do mem.append(), we are keeping the priority to be some default priority, transitions.max().

Shouldn't we do this? : Calculate the priority before appending, and append with that priority. This will keep the complexity same. And attach the priority to the sample right away.

Such level of specification is not found in the paper, to the best of my knowledge.

Kaixhin commented 6 years ago

Adding new transitions with the max priority is in line 6 of the algorithm in the PER paper; the initial value, 1, is given in line 2. Also, calculating the priority means having access to the future states (even more states when calculating multi-step returns) and doing the whole target calculation on a single sample, so it's not that cheap.

marintoro commented 6 years ago

Just read that in the paper DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY from D. Horgan. "In Prioritized DQN (Schaul et al., 2016) priorities for new transitions were initialized to the maximum priority seen so far, and only updated once they were sampled."

But it's interesting to notice that they changed it cause this was not scaling well (this article is all about learning with a lot of different actors).

Ashutosh-Adhikari commented 6 years ago

@Kaixhin Yep, I understand that now when you say so about n-step TD.

Kaixhin commented 6 years ago

Results on 3 games so far look promising, so closing unless a specific problem is identified.