Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.56k stars 282 forks source link

Quick questions on the Quantile loss function #21

Closed hohoCode closed 6 years ago

hohoCode commented 6 years ago

https://github.com/Kaixhin/Rainbow/blob/cf4c3153777c4fddd15dfa09dcf04878fce5640a/agent.py#L62-L67

Great codes. Thanks.

In terms of the action selection, at least from the 'ShangtongZhang for DeepRL' repository, it just seems to me the Quantile loss (or maybe also the Categorical loss) selects the action with the target network, instead of the online_net as done in your code? Just seems to me there is a difference from typical ways of implementing the Quantile loss?

Also I am wondering Quantile vs Categorical in generale which one is better according to your experiments? Thanks.

Kaixhin commented 6 years ago

This repo has the double Q-learning update rule, which uses the online network for action selection and target network for the value to prevent overestimation. It can be applied to any such update.

I don't have the time or resources to compare the two, I just added it in because it seemed a small enough change to Rainbow, and in general I'm not planning to add extra things on top of Rainbow. If you run experiments then let me know what you find!