PKU-MARL / HARL

Official implementation of HARL algorithms based on PyTorch.
450 stars 52 forks source link

Have QMIX implement code ? #3

Closed raykr closed 1 year ago

raykr commented 1 year ago

image There is a QMIX result in the experiment, but I can't find the QMIX code in this repository. Is there a QMIX implementation code available? Thanks. :)

Ivan-Zhong commented 1 year ago

Hello,

We did not implement QMIX in this repository. As clarified in our paper, in SMAC and GRF we used the implementation of QMIX by Yu et al., and in SMACv2 we used the implementation of QMIX by Ellis et al..

raykr commented 1 year ago

Sorry about another question, by reading the code, according to my understanding, hd3qn means dueling double deep q network, which is actually the Q-learning family method, it is known that Q-learning only has critic, but I found that the code also uses actor and critic, both are using the DuelingQNet model, I don't know if I understand it correctly, please ask why this design? Sincerly :)

Ivan-Zhong commented 1 year ago

Your understanding is correct. HAD3QN has 1 critic and n actors, all using the DuelingQNet model. HAD3QN is not an instance of the HAML theoretical framework, but a value-based approximation to HADDPG, for the purpose of tackling discrete action space and investigating the possibility of this approximation (given the connection between DQN and DDPG). It is true that Q-learning only has critic; but here we are in a multi-agent setting and we would prefer decentralised execution. Thus, we replace the deterministic policies in HADDPG with Q networks and follow the sequential update scheme approximately. As a result, the algorithm also has an actor-critic architecture. More details can be found in Appendix I of our paper.

I hope this can answer your questions. :)

raykr commented 1 year ago

Thanks for your response. 👍 :)