Future improvements - Githubissues

Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning

MIT License

1.59k stars 284 forks source link

Future improvements #23

Open jaromiru opened 6 years ago

jaromiru commented 6 years ago

First, hands down, amazing work. Serving as a baseline, I see a possible improvement, if someone wants to implement it:

The n-step return, as it is, is biased (as you are using old off-policy samples). Retrace [Safe and Efficient Off-Policy Reinforcement Learning] would resolve the issue. However, implementing Retrace in Distributional RL is not straightforward, but I see that work [The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning] deals with the issue (as it seems, without the quantile regression, however).

Kaixhin commented 6 years ago

Thanks a lot, I can say the same for your blog 🙂In my spare time I'm still looking for bugs, as I still have issues with some learning on some games (Pong and Breakout being particularly worrying).

In any case I won't be able to work on extending Rainbow for a while, but if anyone is interested I've leaving master for a pure Rainbow implementation and extensions for anything others want to add as options (I added quantile regression as an exercise to myself, but I haven't tested it within Rainbow at all, so it's possible that it's harmful here).

Edit: False alarm on Pong at least

marintoro commented 6 years ago

Hello,

I am reopening this really old issue to just ask a question. Did you test the QR extension on other game than Pong? (and even better did you try to implement Implicit Quantile Networks? ^^)

Kaixhin commented 6 years ago

Nope - I just did that as an implementation exercise to myself, so I've not actually tested it at all (the comment about Pong was about normal Rainbow at the time). I'm not planning to do any further development, but I am trying to test normal Rainbow on a few more games/upload the pretrained models for people to use.

heartInsert commented 3 years ago

I also noice that the Rainbow performs veyr bad in Pong , extremly strange.