Selfplay league runner with flexible configuration and algo presets

kachayev commented 2 years ago

This is very much WIP.

$ poetry run python run_league.py --config-file league_alphastar.yaml

or

$ poetry run python run_league.py --config-file league_openfive.yaml

For league runner to work, it's required to provide 2 entrypoints: train and evaluate. Both take as an argument path to saved agent and saved opponent.

League configuration (in YAML file) gives ability to control:

population structure
matchmaking algo (pick the next opponent)
archival and evaluation scheduling

The runner keeps track of winrates in a payoff table and MMR by running Bayesian update on TrueSkill. Information on winrates and MMR could be used for making decisions on the next opponent.

As an example, 2 presets are implemented:

AlphaStar
OpenFive (not sure if it's required to sample based on MMR), not that implementation turned out to be almost trivial based on league API (I mainly used this as a confirmation for API being flexible enough)

The league supports resume from checkpoint.

There are still a lot of issues, including

proper setup for tensorboard writer/w&b
open questions around seeding
better logger for master process
and more

It's actually pretty hard to iterate on the league runner with MicroRTS env, as it takes long time to get any training done. I'm mostly iterating on SlimeVolley, and some other PettingZoo envs. And I'm think about having league runner as a separate package that could be used as a library and/or CLI tool (from implementation perspective, it's completely independent from details of training or env that is used). WDYT @vwxyzjn?

vwxyzjn commented 2 years ago

As an example, 2 presets are implemented: AlphaStar OpenFive (not sure if it's required to sample based on MMR), not that implementation turned out to be almost trivial based on league API (I mainly used this as a confirmation for API being flexible enough)

Nice! This is really cool!

And I'm think about having league runner as a separate package that could be used as a library and/or CLI tool (from implementation perspective, it's completely independent from details of training or env that is used). WDYT @vwxyzjn?

This makes sense! Lots of projects could benefit from this :)

It's actually pretty hard to iterate on the league runner with MicroRTS env, as it takes long time to get any training done. I'm mostly iterating on SlimeVolley, and some other PettingZoo envs.

I will look further into this and evaluate.

vwxyzjn commented 2 years ago

Looked further into this. Do we have a sense of how "fast" the training is? So if we run poetry run python run_league.py --config-file league_alphastar.yaml for 24 hours, what's the trueskill of the best agent using our league.py to evaluate?

kachayev commented 2 years ago

To answer this question I need to have a GPU 😀 And based on the schedule of other experiments I will be able to run it tomorrow or the day after tomorrow

kachayev commented 2 years ago

Oh, BTW. Found another detail I have to flesh out first: right now evaluation only works against other agents, PvE games are not supported. It's quite easy to cover, so shouldn't take long

vwxyzjn commented 2 years ago

What is PvE?

kachayev commented 2 years ago

Sorry :) It stands for "player vs. environment" (like, built-in bot). In comparison to PvP, as "player vs. player"

vwxyzjn commented 2 years ago

Oh, this makes sense. It would be useful to cover! That said, hopefully, we can also train really strong agents without the help of human-engineered bots :)

Farama-Foundation / MicroRTS-Py

Selfplay league runner with flexible configuration and algo presets #58