BorealisAI / mtmfrl

Multi Type Mean Field Reinforcement Learning
Other
29 stars 10 forks source link

Agent attack within group, and the agents in the same group share Q networks. #4

Closed IpadLi closed 2 years ago

IpadLi commented 2 years ago
  1. In the scenario configurations for the multibattle and multigather, the agents are allowed to attack within the group, that is, the attribution "attack_in_group" is set to True (or 1). Is there any insight into such a setting? This seems contrary to the description of the paper in Sec. 5 paragraph 2, where the setting requires the agents in the same group to be cooperative.
  2. From the algorithm description, each agent will maintain a separate Q network (also target Q network), but in the implementation, it seems that a group of agents will share one Q network (also target Q network). Is there any insight into such a design?
Sriram94 commented 2 years ago
  1. In our experiments we are using a set of mixed cooperative-competitive games, where the agents are expected to learn both cooperation and competition to win the game. For example, in the battle game the agents must learn to cooperate within the group and compete across the group (as described in Sec. 5 Par 2). An agent is expected to learn cooperative strategies using the reward shaping and self-play training as shown in MAgent. In this way the agent is expected to learn that attacking within the group is bad and attacking across the group is good. By restricting the environment to not allow attack within the group, the agent may not need to learn this at all. A naive strategy which simply attacks any agent nearby may also work (then if the other agent is an opponent, the other agent may die and if the other agent is not an opponent nothing will happen). Such a strategy is not good and should not win the battle. We wanted to prevent allowing such strategies to win by making attack in group to have an effect in the system.
  2. The algorithm description uses a very general application scenario. In the experiments we describe that our training is very similar to self play, with each group training a separate network (the agents within the group share this network). So we train four groups across four algorithms where all agents in the group train their own network. As described for the first point, we want the agents to learn cooperation through this self-play scheme. This is the same as described in Mean Field Reinforcement Learning.
IpadLi commented 2 years ago
  1. In our experiments we are using a set of mixed cooperative-competitive games, where the agents are expected to learn both cooperation and competition to win the game. For example, in the battle game the agents must learn to cooperate within the group and compete across the group (as described in Sec. 5 Par 2). An agent is expected to learn cooperative strategies using the reward shaping and self-play training as shown in MAgent. In this way the agent is expected to learn that attacking within the group is bad and attacking across the group is good. By restricting the environment to not allow attack within the group, the agent may not need to learn this at all. A naive strategy which simply attacks any agent nearby may also work (then if the other agent is an opponent, the other agent may die and if the other agent is not an opponent nothing will happen). Such a strategy is not good and should not win the battle. We wanted to prevent allowing such strategies to win by making attack in group to have an effect in the system.
  2. The algorithm description uses a very general application scenario. In the experiments we describe that our training is very similar to self play, with each group training a separate network (the agents within the group share this network). So we train four groups across four algorithms where all agents in the group train their own network. As described for the first point, we want the agents to learn cooperation through this self-play scheme. This is the same as described in Mean Field Reinforcement Learning.

Hi Sriram, Thanks very much for your explanations. I've understood the settings now. I'd like to close this issue.