Open mamadpierre opened 3 years ago
Hi,
Thanks for your interests in our work!
KL is expected when computing L_I. We use the entropy term to encourage exploration in the role space. As an entropy regularization is a commonly used trick in the RL community, we didn't stress it as a main contribution in our paper.
We share a GRU cell for q_\xi and the local Q function (Line. 107 in src/modules/agents/latent_ce_dis_rnnagent.py. The input to q\xi contains h_in, which is the output of the GRU cell) to accelerate training.
Yes, in our implementation, we did not use inference_net
when calculating mi
and dissimilarity
. As shown in Appendix A (the derivation of Eq. 14 and Eq. 20), q_\xi in Eq. 6 can be replaced by any distribution of roles. In our code, we use the distribution generated by the role encoder. Both of them are mathematically correct, and in our repo, we select the one with more robust learning performance. There is a discrepancy between our code and paper. We are sorry for that.
The inference_net
is used to produce a Gaussian distribution, gaussian_infer
, which is then used to calculated the KL term. (Line. 113 in src/modules/agents/latent_ce_dis_rnn_agent.py)
As for the _build_inputs
function, the order is slightly different, and we think this will not influence learning. :)
Hi! When I use ”episode_runner“ to reproduce the code, the following error is always reported. How should I solve it? "AssertionError: #labels should equal with #data points" Thanks!
Please elaborate upon these:
1) In both v1 and v3 archive versions of the paper the KL divergence is being mentioned as the loss in equation (3) (either directly (V3) or indirectly (V1: Cross-Entropy - Entropy (CE - H)), but in the code the CE itself is being considered. (There is a comment in the code that says # CE = KL + H as it clarifies CE itself has been used)
2) The paper says a GRU cell will be considered for the variational estimator q_\xi. I cannot find that GRU cell. As far as I can see there is a sequential (two layered feedforward) for the task. I see a GRU cell but that focuses on producing the agent q-values similar to some other MARL methods.
3) As mentioned in another closed issue,
inference_net
is not involved in the calculation ofmi
anddissimilarity
becauselatent_dis
andlatent_move
are cloned fromlatent
(notlatent_infer
) andmi
hasgaussian_embed
instead ofgaussian_infer
in its formula. I am wondering what is the role ofinference_net
then?are these differences significant specially for the ablation studies done in the paper?
Another small question about the code:
1) Why did you change the function
_build_inputs
in theseparate_controller
module compare with the one written in thebasic_controller
. I see the appending orders changed.inputs.append(batch["obs"][:, t])
is in the middle in your function.I appreciate your time and effort.