Stochastic MuZero for Simultaneous-Move Games

Zachary-Fernandes commented 1 year ago

Hello,

I have been thinking of training different artificial intelligence algorithms for use in Pokémon Showdown, which is a game with simultaneous moves and imperfect information. The package I would use - poke-env - can expose an OpenAI Gym wrapper, which is what makes me think it should be possible to use it. The agent would use self-play on a local Showdown server to train and then ideally be evaluated by challenging opponents on the main Showdown server.

I wanted to ask some questions before I started experimenting. First, Pokémon is a simultaneous-move game, and I understand this is a departure from sequential-move games the original AlphaZero model worked on like Go. Does Stochastic MuZero in its current state support training on simultaneous-move games through self-play?

Second, this would be a new environment used through Gym, so I would hope it is simple to add the environment to this package. What advice would you give for adding the environment and/or tuning the hyperparameters? Thank you in advance.

DHDev0 commented 1 year ago

Use original hyperparameter of the paper (pseudocode)
Your simulation is fundamentally combinatorial so you will need to explore, no short cut with hyperparameter tuning.
Get a lot of cloud compute, i would say 10x that For Backgammon, we used 1 TPU for training and 16 TPUs for acting, for approximately 27 hours equivalent to 10 days on a single V100 GPU. ( https://openreview.net/pdf?id=X6D9bAHhBQ1 page 15 )
Building simulation refer to https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/, example https://github.com/DHDev0/Stochastic-time-series-forecast-simulator

DHDev0 commented 1 year ago

I will close this issue, let me know if you have other question.

DHDev0 / Stochastic-muzero

Stochastic MuZero for Simultaneous-Move Games #5