cyanrain7 / TRPO-in-MARL

MIT License
185 stars 49 forks source link

About the number of Critic Networks #1

Closed Sunrisulfr closed 2 years ago

Sunrisulfr commented 2 years ago

This is a very helpful work, but I have a question about the code: in the code, HAPPO_Policy seems to build a Critic network for each agent, but in the paper there seems to be only one total Critic network. Does this affect the experimental results?

self.actor = Actor(args, self.obs_space, self.act_space, self.device) self.critic = Critic(args, self.share_obs_space, self.device)

Looking forward to your reply, thank you.

cyanrain7 commented 2 years ago

Sorry for causing this confusion. This setting don't affect the experimental results, because each critic train on same transition and using same update policy in each epoch, so their parameters are same. Btw, we'll fix this confusion and make the code more clear as soon as possible. Thanks for your comments!~

cyanrain7 commented 2 years ago

I have add comments before the function of creating critic~

Sunrisulfr commented 2 years ago

Ok, I see. Thanks for your reply!