TonghanWang / ROMA

Codes accompanying the paper "ROMA: Multi-Agent Reinforcement Learning with Emergent Roles" (ICML 2020 https://arxiv.org/abs/2003.08039)
Apache License 2.0
148 stars 34 forks source link

Paper vs Code #17

Open mamadpierre opened 3 years ago

mamadpierre commented 3 years ago

Please elaborate upon these:

1) In both v1 and v3 archive versions of the paper the KL divergence is being mentioned as the loss in equation (3) (either directly (V3) or indirectly (V1: Cross-Entropy - Entropy (CE - H)), but in the code the CE itself is being considered. (There is a comment in the code that says # CE = KL + H as it clarifies CE itself has been used)

2) The paper says a GRU cell will be considered for the variational estimator q_\xi. I cannot find that GRU cell. As far as I can see there is a sequential (two layered feedforward) for the task. I see a GRU cell but that focuses on producing the agent q-values similar to some other MARL methods.

3) As mentioned in another closed issue, inference_net is not involved in the calculation of mi and dissimilarity because latent_dis and latent_move are cloned from latent (not latent_infer) and mi has gaussian_embed instead of gaussian_infer in its formula. I am wondering what is the role of inference_net then?

are these differences significant specially for the ablation studies done in the paper?


Another small question about the code:

1) Why did you change the function _build_inputs in the separate_controller module compare with the one written in the basic_controller. I see the appending orders changed. inputs.append(batch["obs"][:, t]) is in the middle in your function.

I appreciate your time and effort.

TonghanWang commented 3 years ago

Hi,

Thanks for your interests in our work!

  1. KL is expected when computing L_I. We use the entropy term to encourage exploration in the role space. As an entropy regularization is a commonly used trick in the RL community, we didn't stress it as a main contribution in our paper.

  2. We share a GRU cell for q_\xi and the local Q function (Line. 107 in src/modules/agents/latent_ce_dis_rnnagent.py. The input to q\xi contains h_in, which is the output of the GRU cell) to accelerate training.

  3. Yes, in our implementation, we did not use inference_net when calculating mi and dissimilarity. As shown in Appendix A (the derivation of Eq. 14 and Eq. 20), q_\xi in Eq. 6 can be replaced by any distribution of roles. In our code, we use the distribution generated by the role encoder. Both of them are mathematically correct, and in our repo, we select the one with more robust learning performance. There is a discrepancy between our code and paper. We are sorry for that.

The inference_net is used to produce a Gaussian distribution, gaussian_infer, which is then used to calculated the KL term. (Line. 113 in src/modules/agents/latent_ce_dis_rnn_agent.py)

As for the _build_inputs function, the order is slightly different, and we think this will not influence learning. :)

wmk-in-SH commented 2 years ago

Hi! When I use ”episode_runner“ to reproduce the code, the following error is always reported. How should I solve it? "AssertionError: #labels should equal with #data points" Thanks!