Quick questions on the Quantile loss function

Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning

MIT License

1.56k stars 282 forks source link

https://github.com/Kaixhin/Rainbow/blob/cf4c3153777c4fddd15dfa09dcf04878fce5640a/agent.py#L62-L67

Great codes. Thanks.

In terms of the action selection, at least from the 'ShangtongZhang for DeepRL' repository, it just seems to me the Quantile loss (or maybe also the Categorical loss) selects the action with the target network, instead of the online_net as done in your code? Just seems to me there is a difference from typical ways of implementing the Quantile loss?

Also I am wondering Quantile vs Categorical in generale which one is better according to your experiments? Thanks.

Kaixhin / Rainbow

Quick questions on the Quantile loss function #21