chaovven / SMIX

Code for "SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning" AAAI 2020
Apache License 2.0
26 stars 5 forks source link

Why there is only one agent in the controller? #6

Closed GoingMyWay closed 4 years ago

GoingMyWay commented 4 years ago

Dear chaovven, In https://github.com/chaovven/SMIX/blob/master/src/controllers/basic_controller.py#L75, I found there is only one controller defined here, for homogenous envs, I think it is ok to do so, it is homogenous agents and agents can learn one agent, however, for heterogeneous envs for example 3s5z_vs_3s6z, I think it is not ok to do so since there are ss and zs in each group, right? So, I think there should be defined new agent for each type agent.

chaovven commented 4 years ago

Dear chaovven, In https://github.com/chaovven/SMIX/blob/master/src/controllers/basic_controller.py#L75, I found there is only one controller defined here, for homogenous envs, I think it is ok to do so, it is homogenous agents and agents can learn one agent, however, for heterogeneous envs for example 3s5z_vs_3s6z, I think it is not ok to do so since there are ss and zs in each group, right? So, I think there should be defined new agent for each type agent.

Hi, there is only one agent network that is shared by all agents to perform action selection. It's trivial for the homogenous case, as you said. For heterogeneous scenarios, it's achieved by assuming all agents have the same observation size, and at the same time, taking the max over all agents' available number of action as the extended action space for the shared network. By utilizing available_actions returned by the environment, we can mask out unavailable actions for certain types of agents.

GoingMyWay commented 4 years ago

Dear chaovven, In https://github.com/chaovven/SMIX/blob/master/src/controllers/basic_controller.py#L75, I found there is only one controller defined here, for homogenous envs, I think it is ok to do so, it is homogenous agents and agents can learn one agent, however, for heterogeneous envs for example 3s5z_vs_3s6z, I think it is not ok to do so since there are ss and zs in each group, right? So, I think there should be defined new agent for each type agent.

Hi, there is only one agent network that is shared by all agents to perform action selection. It's trivial for the homogenous case, as you said. For heterogeneous scenarios, it's achieved by assuming all agents have the same observation size, and at the same time, taking the max over all agents' available number of action as the extended action space for the shared network. By utilizing available_actions returned by the environment, we can mask out unavailable actions for certain types of agents.

Yeah, that is true.

GoingMyWay commented 4 years ago

Dear chaovven, In https://github.com/chaovven/SMIX/blob/master/src/controllers/basic_controller.py#L75, I found there is only one controller defined here, for homogenous envs, I think it is ok to do so, it is homogenous agents and agents can learn one agent, however, for heterogeneous envs for example 3s5z_vs_3s6z, I think it is not ok to do so since there are ss and zs in each group, right? So, I think there should be defined new agent for each type agent.

Hi, there is only one agent network that is shared by all agents to perform action selection. It's trivial for the homogenous case, as you said. For heterogeneous scenarios, it's achieved by assuming all agents have the same observation size, and at the same time, taking the max over all agents' available number of action as the extended action space for the shared network. By utilizing available_actions returned by the environment, we can mask out unavailable actions for certain types of agents.

Hi, sir, did you try micro-trick scenarios for example bane_vs_bane? Based on your experience, can the pymarl and your code train micro-trick without changing any code?