Closed Sunrisulfr closed 2 years ago
Sorry for causing this confusion. This setting don't affect the experimental results, because each critic train on same transition and using same update policy in each epoch, so their parameters are same. Btw, we'll fix this confusion and make the code more clear as soon as possible. Thanks for your comments!~
I have add comments before the function of creating critic~
Ok, I see. Thanks for your reply!
This is a very helpful work, but I have a question about the code: in the code, HAPPO_Policy seems to build a Critic network for each agent, but in the paper there seems to be only one total Critic network. Does this affect the experimental results?
self.actor = Actor(args, self.obs_space, self.act_space, self.device) self.critic = Critic(args, self.share_obs_space, self.device)
Looking forward to your reply, thank you.