Round-robin implementation in the paper VS MultiagentEnv class

Thank you for bringing this up! We originally had a pure round-robin resampling policy, but once we began supporting >2-player environments, the definition of "round-robin" became unclear. For example, should round-robin advance through each partner choice at the same time, or should all potential matches be covered?

For our purposes, random sampling was essentially as good as round-robin, because we just wanted to ensure that each partner has around the same amount of training time. However, to make our implementation consistent with the paper, I've just pushed an update that changes the default behavior of 2-player environments to do round-robin sampling, but >2-player environments will still do random sampling.

However, if you need to have even finer control over choosing the next partner at the start of each episode, I would suggest creating a new type of agent that acts as a wrapper around multiple simple agents. This new agent will perform your own resampling procedure at the end of each episode (so when "update" is called, you can check if done is true and act accordingly).

Stanford-ILIAD / PantheonRL

Round-robin implementation in the paper VS MultiagentEnv class #1