glmcdona / LuxPythonEnvGym

Matching python environment code for Lux AI 2021 Kaggle competition, and a gym interface for RL models.
MIT License
73 stars 38 forks source link

[feature request] callback returning wins (or winrate) vs another agent #100

Open kevinu3d opened 2 years ago

kevinu3d commented 2 years ago

When training with different reward functions it's hard to compare 2 bots. A callback capable of running n games between current agent and another would prove useful to measure progress.

I will look into it but if someone knows how to do that help is welcome.

glmcdona commented 2 years ago

Good idea.

This here is probably the closest existing code to it: https://github.com/glmcdona/LuxPythonEnvGym/blob/c5d4c161b8aa0c4d5ca7ef08ab1d597be81a7883/luxai2021/env/lux_env.py#L14

And example usage: https://github.com/glmcdona/LuxPythonEnvGym/blob/c5d4c161b8aa0c4d5ca7ef08ab1d597be81a7883/examples/train.py#L114

You can also look at the built-in eval callback: https://github.com/glmcdona/LuxPythonEnvGym/blob/c5d4c161b8aa0c4d5ca7ef08ab1d597be81a7883/examples/train.py#L136

One last thing to look at is this old example code that logged custom game information to the tensorboard. It's was a bit of a hackjob because of the agent having been reset before tensorboard grabs data: PR #86.