Closed IpadLi closed 2 years ago
- In our experiments we are using a set of mixed cooperative-competitive games, where the agents are expected to learn both cooperation and competition to win the game. For example, in the battle game the agents must learn to cooperate within the group and compete across the group (as described in Sec. 5 Par 2). An agent is expected to learn cooperative strategies using the reward shaping and self-play training as shown in MAgent. In this way the agent is expected to learn that attacking within the group is bad and attacking across the group is good. By restricting the environment to not allow attack within the group, the agent may not need to learn this at all. A naive strategy which simply attacks any agent nearby may also work (then if the other agent is an opponent, the other agent may die and if the other agent is not an opponent nothing will happen). Such a strategy is not good and should not win the battle. We wanted to prevent allowing such strategies to win by making attack in group to have an effect in the system.
- The algorithm description uses a very general application scenario. In the experiments we describe that our training is very similar to self play, with each group training a separate network (the agents within the group share this network). So we train four groups across four algorithms where all agents in the group train their own network. As described for the first point, we want the agents to learn cooperation through this self-play scheme. This is the same as described in Mean Field Reinforcement Learning.
Hi Sriram, Thanks very much for your explanations. I've understood the settings now. I'd like to close this issue.