Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Taking the max over step frame buffer #59

Closed guydav closed 4 years ago

guydav commented 4 years ago

Hey Kai,

I'm digging into the implementation details, and have another question about a particular detail. in Env.step(), you store the frames after the 3rd and 4th repetitions of the actions, and then take the pixel-wise max between the two as the observation. Why do you do that? Is there a particular paper this comes from?

Thank you!

Kaixhin commented 4 years ago

From the Methods of the Nature DQN paper:

First, to encode a single frame we take the maximum value for each pixel colour value over the frame being encoded and the previous frame. This was necessary to remove flickering that is present in games where some objects appear only in even frames while other objects appear only in odd frames, an artefact caused by the limited number of sprites Atari 2600 can display at once.

guydav commented 4 years ago

Thank you! I don't know how I missed this, I definitely searched in that paper.