DHDev0 / Stochastic-muzero

Pytorch Implementation of Stochastic MuZero for gym environment. This algorithm is capable of supporting a wide range of action and observation spaces, including both discrete and continuous variations.
GNU General Public License v3.0
56 stars 10 forks source link

Stochastic MuZero for Simultaneous-Move Games #5

Closed Zachary-Fernandes closed 1 year ago

Zachary-Fernandes commented 1 year ago

Hello,

I have been thinking of training different artificial intelligence algorithms for use in Pokémon Showdown, which is a game with simultaneous moves and imperfect information. The package I would use - poke-env - can expose an OpenAI Gym wrapper, which is what makes me think it should be possible to use it. The agent would use self-play on a local Showdown server to train and then ideally be evaluated by challenging opponents on the main Showdown server.

I wanted to ask some questions before I started experimenting. First, Pokémon is a simultaneous-move game, and I understand this is a departure from sequential-move games the original AlphaZero model worked on like Go. Does Stochastic MuZero in its current state support training on simultaneous-move games through self-play?

Second, this would be a new environment used through Gym, so I would hope it is simple to add the environment to this package. What advice would you give for adding the environment and/or tuning the hyperparameters? Thank you in advance.

DHDev0 commented 1 year ago
DHDev0 commented 1 year ago

I will close this issue, let me know if you have other question.