Performance of baselines on your modified SMAC

TonghanWang / RODE

Codes accompanying the paper "RODE: Learning Roles to Decompose Multi-Agent Tasks (ICLR 2021, https://arxiv.org/abs/2010.01523). RODE is a scalable role-based multi-agent learning method which effectively discovers roles based on joint action space decomposition according to action effects, establishing a new state of the art on the StarCraft multi-agent benchmark.

Apache License 2.0

69 stars 20 forks source link

Performance of baselines on your modified SMAC #11

Open guestreturn opened 2 years ago

guestreturn commented 2 years ago

Hello! Thank you for sharing your code and RODE is a very interesting work!

However, I am confused by the experimental results. We found that you modified the SMAC environment, which is not mentioned in your paper. And when we carried out QMIX on your modified SMAC, we found that QMIX achieved performances comparable to RODE on hard scenarios like MMM2. We use five independent runs for each algorithms to get convincing results. So we suspect that there may be an unfair comparison of the results in the paper. The performance improvement of RODE may benefit from the modified SMAC environment rather than the role-based mechanism.

It may be a careless mistake. Could you check the performance of QMIX or other baselines in your modified SMAC?

TonghanWang commented 2 years ago

We first clarify why we modified the SMAC environments. SMAC actions are related to the identity of enemies (an attack action is aimed at a specific enemy). Every time the environment is reset, the same action ID may attack different enemies, resulting in the change of action semantics, which makes it impossible to learn action representation. So we sorted enemies.

When we submitted our paper, we didn't realize that this change can also improve the performance of QMIX. After we got aware of this issue, we tested QMIX on this changed SMAC benchmark. RODE still outperforms on most of super hard scenarios like corridor and _6h_vs8z.

guestreturn commented 2 years ago

Thank you very much for your reply!

I believe it is not a deliberate mistake. However, we think you should state in a prominent position in Readme.md that you have changed the SMAC environment, which will avoid misleading followers.

The performance of RODE in the paper has absolute advantages over other baselines, but you know, this is not true in practice. Other baselines, like QPLEX and Weighted QMIX, should perform better than QMIX on your changed SMAC benchmark. If their true performances are taken into account, can RODE still be state of the art in a few super hard scenarios? Could you rerun other baselines on your modified SMAC and then update the true result in your paper? Although it will take you some time, we think it will benefit the whole MARL community.

Thank you very much!

TonghanWang commented 2 years ago

Thank you very much for pointing this out!

We agree it will benefit the MARL community. We are going to update the paper's results as soon as possible.

TonghanWang commented 2 years ago

Hi, we start to test QPLEX and Weighted-QMIX on our SC2 environments, and results will be avaliable in a few days.

In my humble opinion, there is still a question that I want to discuss with you. Do you think this comparison is fair? RODE uses QMIX for the role selector and role policies for mixing. Using QPLEX or Weighted-QMIX in RODE may have different results. For now, we are sure RODE performs better than QMIX.

guestreturn commented 2 years ago

We are happy and excited to see that things are progressing well! Thank you for all you have done!

We know that RODE is an excellent work, and the components of its framework can be composed of any value decomposition method. However, our original intention of proposing this issue is to know the real performance comparison of different algorithms under the same SMAC. In addition, to make the comparison more fair, we believe that the neural network model size of each algorithm should be similar. As we know, the model size of RODE is twice that of QMIX but RODE performs worse than QMIX in almost all easy and hard scenarios.

We proposed this issue to point out that there is an unfair comparison in the original paper because of different SMAC, and the experimental results in your paper are quite different from the reality.

Warry98 commented 2 years ago

Hello there, thank you for addressing this issue, are there any updates on the new tests in the modified environment?

guestreturn commented 2 years ago

Hi, a month has passed and how is the new experiments going? Maybe you can update some of the experimental results first and state in a prominent position in Readme.md that you have changed the SMAC environment. We should avoid academic misconduct and show real experimental results. Is that right?

Warry98 commented 2 years ago

@TonghanWang any updates on this?