geek-ai / MAgent

A Platform for Many-Agent Reinforcement Learning
MIT License
1.68k stars 332 forks source link

Some queries about input views of agents #16

Closed ashwinipokle closed 6 years ago

ashwinipokle commented 6 years ago

Hello,

We were checking the implementation of DQN in multi-agent environment (pursuit) as given in this repo. Everything seemed the same as the usual DQN, except for the input views (observations) that are being passed for each agent. What exactly do they represent and why does it have 4 dimensions ? As per my understanding in the usual DQN we only have observations with 3 dimensions.

Also, please let me know if there are any other differences compared to the single-agent DQN.

Thanks!

wnzhang commented 6 years ago

When the agent number goes huge (not just dozens), it is mostly infeasible to model each agent with an indepedent NN, thus the parameter sharing mechanism a necessity. In MAgent, so far the implementation of such a parameter sharing mechanism is to combine the agent ID into the input (agent observation) to make the Q network behave different for different agents. In addition, all agents SARSA observation data is sent to the Q network for training, this is also different from single agent DQN.

ashwinipokle commented 6 years ago

Thank you for your reply. As per my understanding, you are using a DQN with shared parameters, and it's input includes observation frame, agent id, and a one-step lookahead i.e. SARSA data. Is this additional data being passed in the Processing Model?

I was experimenting with a map of dimensions 30X30 with 10 agents in pursuit game. I noticed that the dimensions of the input to DQN is 10X10X5 for predators and 9X9X5 for prey. Does this mean that each agent has only partial information of the game (and not fully observable) ?

Finally, suppose I want to use this environment to experiment with another algorithm. Could you tell me what additional changes do I need to do to ensure that this additional data like agent id and SARSA is being passed and processed correctly ? Thank you!

merrymercy commented 6 years ago
  1. Every agent has only partial local view. The view range of them is defined here by view_range https://github.com/geek-ai/MAgent/blob/45aee14dbbf490c461df0ec11106c218fa8e5c86/python/magent/builtin/config/pursuit.py#L10-L16 For predators, it is a circle with radius of 5. So the input shape is diameter x diameter = 10x10. It is worth noting that the view range is a circle, so margin pixels in this 10x10 square that are not in the circle are always masked by 0. The last dimension '5' in 10x10x5 is the number of channel. See our doc (https://github.com/geek-ai/MAgent/blob/master/doc/get_started.md#observation) for more explanation. In pursuit, there is no minimap channel. The 5 channels correspond to (wall indicator, predator indicator, predator's hp, prey indicator, prey's hp) Indicator is 1 if there is an object in that pixel and 0 otherwise.

  2. As mentioned in our doc (https://github.com/geek-ai/MAgent/blob/master/doc/get_started.md#observation)), there are two components of the observation of agents. Your algorithm needs to take in both of them. (e.g. flatten both and concatenate) The doc string in our API documented the detailed format https://github.com/geek-ai/MAgent/blob/45aee14dbbf490c461df0ec11106c218fa8e5c86/python/magent/gridworld.py#L221-L234

ashwinipokle commented 6 years ago

Thank you so much for quick reply !