Closed hyoonsoo closed 8 months ago
CoPO shares one policy for all agents. This is because the number of agents in the environment is consistently changing and it is impossible to train policy for each active vehicle.
Changing part of agents to follow other policy is possible. Change the training config of RLlib in this item: config["multiagent"]
but this is a little bit complicated. You might wish to refer to RLLib's docs to see the concepts of "policy" and "agent".
Basically, we bind all agents to the policy named "default", which is a shared CoPO policy. But it is possible for you to, e.g., bind all agent with odd agent index "agent1, agent3, ..." to be controlled by other policy like Independent PPO.
****I am asking because I am confused whether 'CoPO' trains several policies or not. I understood that It would use one policy per agent, is this right?