geek-ai / MAgent

A Platform for Many-Agent Reinforcement Learning
MIT License
1.7k stars 334 forks source link

About reward and action in pursuit #72

Open znnby1997 opened 4 years ago

znnby1997 commented 4 years ago

I generated a game with a 10 * 10 map - pursuit. There are one predator with my own a2c model and two preys with random actor. By training, predator's total reward per episode converges to zero, never higher than zero. Does it mean predator never chooses to attack any preys? How can predator get a positive reward? a2c_rewards

lml519 commented 4 years ago

u can see the reward about the predator. i remembered the predator should get the positive reward when they attacked. Meanwhile, when they surrounded the preys they can attack the prey and get positive reward. i dont know if i get the true realization. i wish this can help u.

znnby1997 commented 4 years ago

Just one predator in the map, can this predator get a positive reward? or can the predator attack any preys if and only if there is one predator in the map?