Closed WangzcBruce closed 1 year ago
Hi there. So the question is: why some results of When2com and Who2com are the same? That's beacuse during training, both When2com and Who2com do not prune their fully connnected communication graph. The fusion weights are obtained by the softmax function to allow the gradient calculation. If you go through the papers carefully, you will find that When2com and Who2com only differ in inference. Thus, I just trained one model for these two methods (i.e., When2com and Who2com), so they are same without pruning the communication graph. However, the results using the proposed communication graph pruning methods are marked in italic (i.e., When2com and Who2com), and as you can see, they are different. Besides, I think When2com is generally better than Who2com since When2com provides a more flexible alternative to the argmax operation in Who2com. I may release my code after I finish my project. If you have more questions, feel free to reach me~
Thanks for your help, my dear friend!
Actually, during the training, according to the released code, the who2com is different from where2com. Who2com's communication graph is based on the cross attention while the When2com is based on self&cross attention.
实际上在公开的代码里两种方法是不同的, who2com是计算交叉注意力后将其他agent特征进行加权求和后concat到本地特征后,进行下游任务的,通道数是512*2;when2com是交叉注意力和自注意力同时进行后,将所有agent的特征进行加权求和后进行下游任务,通道数是512;
我对https://github.com/GT-RIPL/MultiAgentPerception mrms这部分进行复现后发现,when2com相对于who2com能够更准确的选择何时进行通信,有了更小的带宽消耗,实际上两者指标几乎相同。
when2com
who2com
在进行mrmps时,每张request与response角度存在偏差,实验结果when2com效果不如who2com,并且训练出来的when2com趋向于拒绝进行通信,获得的指标与非进行通信的occdeg方法相近。
occdeg:
when2com
who2com
I felt quite confused。
You are right that the implementation differs. In the paper, they do have channel concatenation operation in both methods (Who2com Equation (5) & When2com Equation (7)). In the code, I think the author didn't use concatenation for When2com because When2com already has self-attention while Who2com will lose the ego feature if we don't concatenate it. However, this will introduce the difference in model capacity (i.e., channel dimension). Thus, in my reproduction, I follow the original description in their paper for a fair comparison between these two methods to see the real improvement from Who2com to When2com. There's no big difference. You can regard concatentation in When2com as a fixed linear transformation that increases the channel dimension.
Honestly, I think there are some unreasonable designs in Who2com and When2com. Their ideas may be inspired by previous MARL papers (e.g., TarMAC) and do provide an interesting perspective. The key issue in their formula is, they try to communicate with the cooperators with high similarity scores. However, for a cooperative system, I think agents usually need complementary information from the others rather than similar one.
For your confusion, does noise exist in the MRMPS setting? If so, I think When2com may learn a shortcut in your experiments, which means simply relying on ego features can converge well. But in Who2com. this shortcut is explicitly provided by concatenation, so it can learn the cooperation. You can add ego feature concatentation to When2com and see how it goes. Also, the message generation in handshake of Who2com and When2com is somewhat brute. Try more advanced methods for communication. Besides, it seems that there are more than one strange result in your screenshots. I will suggest you move to some new codebases to study this topic.
Sincerely appreciate all your replies!! Dear friend Little-Podi has already solved most of my confusions. Best wishes!!
同学我看您复现的两个方法指标很多是一致的,感觉比较奇怪,我个人在跑https://github.com/GT-RIPL/MultiAgentPerception/中的两个方法的时候也发现了这个问题,甚至有的时候when2com不如who2com