Tanking strategy training with attention layer has no significant improvement

Strategy Trained; Tank Strategy:

For training the "tanking strategy": where one hero with high health but no attack should tank for another hero with high attack but low health. Ideally, the 'attack' hero should try to stay away from the enemy whenever it gets targeted, while the 'tank' hero should stay with the enemy to draw its attention. As the enemy hero in our MOBA environment will keep attacking one target, the strategy will succeed trivially as long as the 'tank' hero manages to establish itself as the target.

Problem

When training with the original vanilla PPO (the original ppo.py), the training fails to converge. Adding attention does not seem to improve the training.

Attention layer

A simple self-attention layer is added in between the FC-Relu layers. With experiments, the exact position of where the attention layer is added seems not to affect the training. The attention layer is a one head self-attention layer that takes in the feature vector of each hero, and outputs a feature vector for each hero. The output of the attention layer is then used to feed into subsequent FC-Relu layers.

Training Details:

Map:

 {
     "Restricted_x" : 600,
     "Restricted_y" : 600,
     "Restricted_w" : 600,
     "Restricted_h" : 600,
     "SelfHeroes" : [2006, 2008],
     "OppoHeroes" : [1008],
     "SpawnAreaWidth" : 100,
     "SelfTowers" : [],
     "OppoTowers" : [],
     "SelfCreeps" : [],
     "OppoCreeps" : []
 }

Heros:

Self-hero: Tank

 {
     "name" : "lusian",        
     "attackFreq" : 1,         
     "attackRange" : 20,       
     "viewRange" : 2000,       
     "health" : 1500,          
     "damage" : 0,             
     "speed" : 80,             
     "skills" : [1, -1, -1, -1]
 }

Self-hero: Attack

 {
     "name" : "lusian",        
     "attackFreq" : 1,         
     "attackRange" : 20,       
     "viewRange" : 2000,       
     "health" : 500,           
     "damage" : 100,           
     "speed" : 40,             
     "skills" : [-1, -1, -1, -1]
 }

Enemy hero:

 {
     "name" : "lusian",        
     "attackFreq" : 1,         
     "attackRange" : 20,       
     "viewRange" : 2000,       
     "health" : 1500,          
     "damage" : 100,           
     "speed" : 20,             
     "skills" : [-1, -1, -1, -1]
 }

Modified training rates and other changed parameters, please refer to the diffs in the Pull Request.

gerrysonx / moba_env