decisionforce / CoPO

[NeurIPS 2021] Official implementation of paper "Learning to Simulate Self-driven Particles System with Coordinated Policy Optimization".
Apache License 2.0
117 stars 21 forks source link

dos CoPO use only one policy? or many policies for each agent? #30

Closed hyoonsoo closed 8 months ago

hyoonsoo commented 1 year ago

****I am asking because I am confused whether 'CoPO' trains several policies or not. I understood that It would use one policy per agent, is this right?

pengzhenghao commented 1 year ago

CoPO shares one policy for all agents. This is because the number of agents in the environment is consistently changing and it is impossible to train policy for each active vehicle.

pengzhenghao commented 1 year ago

Changing part of agents to follow other policy is possible. Change the training config of RLlib in this item: config["multiagent"] but this is a little bit complicated. You might wish to refer to RLLib's docs to see the concepts of "policy" and "agent".

Basically, we bind all agents to the policy named "default", which is a shared CoPO policy. But it is possible for you to, e.g., bind all agent with odd agent index "agent1, agent3, ..." to be controlled by other policy like Independent PPO.