Stanford-ILIAD / PantheonRL

PantheonRL is a package for training and testing multi-agent reinforcement learning environments. PantheonRL supports cross-play, fine-tuning, ad-hoc coordination, and more.
MIT License
121 stars 18 forks source link

Round-robin implementation in the paper VS MultiagentEnv class #1

Closed mmcelikok closed 2 years ago

mmcelikok commented 2 years ago

As far as I can see, the MultiagentEnv class randomly samples a partner from the partner list in the beginning of each episode. But in the paper "PantheonRL: A MARL Library for Dynamic Training Interactions" Listing 1, you implement round-robin by simply adding two partners and running the learning. This won't be round-robin though right? It will randomly sample a partner from partner-list instead of going through each partner in a pre-specified order.

bsarkar321 commented 2 years ago

Thank you for bringing this up! We originally had a pure round-robin resampling policy, but once we began supporting >2-player environments, the definition of "round-robin" became unclear. For example, should round-robin advance through each partner choice at the same time, or should all potential matches be covered?

For our purposes, random sampling was essentially as good as round-robin, because we just wanted to ensure that each partner has around the same amount of training time. However, to make our implementation consistent with the paper, I've just pushed an update that changes the default behavior of 2-player environments to do round-robin sampling, but >2-player environments will still do random sampling.

However, if you need to have even finer control over choosing the next partner at the start of each episode, I would suggest creating a new type of agent that acts as a wrapper around multiple simple agents. This new agent will perform your own resampling procedure at the end of each episode (so when "update" is called, you can check if done is true and act accordingly).