Open qizhg opened 2 years ago
Dear qizhg,
Thanks for your comments.
The softmax approximation is an approximation method since it is easy to implement. You can also accurately calculate the re-weighting ratio. Please refer to the ICQ_mu.
We find the TD error will have a more stable performance. You can also adopt the Q-value, which will have a sound performance. Both two approaches have the same insights.
Sorry for the late reply, and Thanks again for your question!
Best
Yiqin
Dear author, thanks for making the code available.
I have two questions regarding ICQ_softmax, where the weight is approximated by the softmax over the minibatch:
Thanks!