cyanrain7 / TRPO-in-MARL

MIT License
180 stars 48 forks source link

How do you use global information and local information in multi-agent mujoco? #2

Closed Weiyuhong-1998 closed 2 years ago

Weiyuhong-1998 commented 2 years ago

I notice that in your multi-agent mujoco environment codes,

def get_obs(self):
    """ Returns all agent observat3ions in a list """
    state = self.env._get_obs()
    obs_n = []
    for a in range(self.n_agents):
        agent_id_feats = np.zeros(self.n_agents, dtype=np.float32)
        agent_id_feats[a] = 1.0
        # obs_n.append(self.get_obs_agent(a))
        # obs_n.append(np.concatenate([state, self.get_obs_agent(a), agent_id_feats]))
        # obs_n.append(np.concatenate([self.get_obs_agent(a), agent_id_feats]))
        obs_i = np.concatenate([state, agent_id_feats])
        obs_i = (obs_i - np.mean(obs_i)) / np.std(obs_i)
        obs_n.append(obs_i)
    return obs_n

def get_state(self, team=None):
    # TODO: May want global states for different teams (so cannot see what the other team is communicating e.g.)
    state = self.env._get_obs()
    share_obs = []
    for a in range(self.n_agents):
        agent_id_feats = np.zeros(self.n_agents, dtype=np.float32)
        agent_id_feats[a] = 1.0
        # share_obs.append(np.concatenate([state, self.get_obs_agent(a), agent_id_feats]))
        state_i = np.concatenate([state, agent_id_feats])
        state_i = (state_i - np.mean(state_i)) / np.std(state_i)
        share_obs.append(state_i)
    return share_obs

They all use self.env._get_obs() and will return the same obs information, so in your codes, what the differences between get_obs() and get_state(), and how do you use global information and local information in your algorithm?

cyanrain7 commented 2 years ago

Yes, in Multi-agent mujoco environment, all agents can see the global information, so it's MDP, not POMDP, we use the same setting for other algorithms, and try to figure out the cooperative relation among agents. We use SMAC environment to verify our algo also can work well in POMDP setting. If you want to see the performance of our algorithm in multi agent mujoco with POMDP setting, you can modify the function get_obs(). Hope my answer can help you.