Recurrent controller instead of stacked image observation?

denisyarats / drq

DrQ: Data regularized Q

https://sites.google.com/view/data-regularized-q

MIT License

405 stars 52 forks source link

Recurrent controller instead of stacked image observation? #1

Closed miriaford closed 4 years ago

miriaford commented 4 years ago

This is an amazing work, thanks a lot for sharing!!

The paper states that stacking the last 3 image frames can convert POMDP to MDP. While I understand this is common practice, I wonder if you have tried using GRU/LSTM controller? Does it typically perform better or worse than frame stacking in your experience?

denisyarats commented 4 years ago

That is a good point and definitely worth exploring! We did notice that in some tasks stacking only 3 frames might not be sufficient. However, it seems that for MuJoCo-based (and Atari, it’s a different domain though) tasks taking 3 frames is usually sufficient. For more complicated tasks, where there is a very long dependency between policy and observed actions, an RNN controller is crucial though.

miriaford commented 4 years ago

Thanks for your comments!

Now that you mention Atari, I didn't find any results in the paper, but I'm very curious how your method compares against others on Atari games. Do you have any tables or just general findings to share?