Kaixhin / Atari

Persistent advantage learning dueling double DQN for the Arcade Learning Environment
MIT License
263 stars 74 forks source link

Finish prioritised experience replay #42

Open Kaixhin opened 8 years ago

Kaixhin commented 8 years ago

Rank-based prioritised experience replay appears to be working, but technically needs some changes. Instead of storing terminal states with a priority of 0, they should not be stored at all. This requires more checks, as the elements in the experience replay memory and the elements in the priority queue will differ.

Secondly, proportional prioritised experience replay still needs to be implemented. See here and here for an implementation of the sum binary tree.

For reference, below are results from a working implementation of rank-based PER on Frostbite: scores

Damcy commented 8 years ago

maybe we can store experience as a tuple like (s_t, a, r, s_t_1, t), terminal state will not be store in experience replay if use this pattern. usually t is 0, and t == 1 would generate tuple (s, a, r, TERMINAL_STATE, 1)

Kaixhin commented 8 years ago

Note: It might be worth subclassing the Heap from torchlib for the priority queue.