Questions on ICQ_softmax

YiqinYang / ICQ

Codes accompanying the paper "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning" (NeurIPS 2021 Spotlight https://arxiv.org/abs/2106.03400)

70 stars 8 forks source link

Questions on ICQ_softmax #3

Open qizhg opened 2 years ago

qizhg commented 2 years ago

Dear author, thanks for making the code available.

I have two questions regarding ICQ_softmax, where the weight is approximated by the softmax over the minibatch:

Why len(weights), e.g., this line, is needed to scale the softmax distribution?
Why the softmax is performed wrt the TD error in this line, instead of wrt the Q value suggested in the paper?

Thanks!

YiqinYang commented 2 years ago

Dear qizhg,

Thanks for your comments.

The softmax approximation is an approximation method since it is easy to implement. You can also accurately calculate the re-weighting ratio. Please refer to the ICQ_mu.
We find the TD error will have a more stable performance. You can also adopt the Q-value, which will have a sound performance. Both two approaches have the same insights.

Sorry for the late reply, and Thanks again for your question!

Best

Yiqin