Question about MASAC - Githubissues

Hi, thanks for the question and my apologies for the late reply.

It would have been clearer and better for me to use the subscript i in the y, because these targets are indeed individual for each Q network. You are definitely right about that.

About your comparison with independent Q-learning: I do think there are a few more differences with independent Q-learning: Due to the centralized critics, MASAC is a centralized learning for decentralized execution algorithm I believe independent Q-learning is a pure independent learners algorithm. In addition, I don't think independent Q-learning usually uses actors nor is able to use continuous action spaces. In addition, I think that the fact that centralized critics are used is a fairly important distinction as it

I believe that the algorithm closest to MASAC would perhaps be MADDPG, since MASAC is simply a maximum-entropy variant of the MADDPG algorithm.

danielwillemsen / MAMBPO

Question about MASAC #3