Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Interrupted history transitions in ReplayMemory? #42

Closed albertwujj closed 5 years ago

albertwujj commented 5 years ago

As segment tree uses a cyclic buffer, the beginning of some transitions (the history part) will be overwritten by new transitions. Thus the 'history' transitions will not be the actual previous transitions. I do not see the code to handle this. Is it a bug?

albertwujj commented 5 years ago

Could be why https://github.com/Kaixhin/Rainbow/issues/16

Kaixhin commented 5 years ago

It should be handled by these conditions which make sure that samples near the index are discarded and resampled. Go ahead and work through the logic (I'm not 100% certain that this is bug-free, but I believe it's fine).

albertwujj commented 5 years ago

Cool! On a first read it looks correct (I will certainly let you know if I discover its not).

I'm modifying your ReplayBuffer to allow parallel envs (still store all transitions in a 1D array, but step back num_envs indices rather than 1 index to get previous timestep). The code you linked is actually a place I missed updating, leading to a bug (getting None when looking for n-step transition states). Thank you so much for the help!

albertwujj commented 5 years ago

Oops, I see there is a whole paper on 'distributed' replay: https://arxiv.org/pdf/1803.00933.pdf