Open kachayev opened 2 years ago
As an example, 2 presets are implemented: AlphaStar OpenFive (not sure if it's required to sample based on MMR), not that implementation turned out to be almost trivial based on league API (I mainly used this as a confirmation for API being flexible enough)
Nice! This is really cool!
And I'm think about having league runner as a separate package that could be used as a library and/or CLI tool (from implementation perspective, it's completely independent from details of training or env that is used). WDYT @vwxyzjn?
This makes sense! Lots of projects could benefit from this :)
It's actually pretty hard to iterate on the league runner with MicroRTS env, as it takes long time to get any training done. I'm mostly iterating on SlimeVolley, and some other PettingZoo envs.
I will look further into this and evaluate.
Looked further into this. Do we have a sense of how "fast" the training is? So if we run poetry run python run_league.py --config-file league_alphastar.yaml
for 24 hours, what's the trueskill of the best agent using our league.py
to evaluate?
To answer this question I need to have a GPU 😀 And based on the schedule of other experiments I will be able to run it tomorrow or the day after tomorrow
Oh, BTW. Found another detail I have to flesh out first: right now evaluation only works against other agents, PvE games are not supported. It's quite easy to cover, so shouldn't take long
What is PvE?
Sorry :) It stands for "player vs. environment" (like, built-in bot). In comparison to PvP, as "player vs. player"
Oh, this makes sense. It would be useful to cover! That said, hopefully, we can also train really strong agents without the help of human-engineered bots :)
This is very much WIP.
or
For league runner to work, it's required to provide 2 entrypoints:
train
andevaluate
. Both take as an argument path to saved agent and saved opponent.League configuration (in YAML file) gives ability to control:
The runner keeps track of winrates in a payoff table and MMR by running Bayesian update on TrueSkill. Information on winrates and MMR could be used for making decisions on the next opponent.
As an example, 2 presets are implemented:
The league supports resume from checkpoint.
There are still a lot of issues, including
It's actually pretty hard to iterate on the league runner with MicroRTS env, as it takes long time to get any training done. I'm mostly iterating on SlimeVolley, and some other PettingZoo envs. And I'm think about having league runner as a separate package that could be used as a library and/or CLI tool (from implementation perspective, it's completely independent from details of training or env that is used). WDYT @vwxyzjn?