Combining Gumbel MuZero and Stochastic MuZero

google-deepmind / mctx

Monte Carlo tree search in JAX

Apache License 2.0

2.36k stars 191 forks source link

Combining Gumbel MuZero and Stochastic MuZero #66

Closed carlosgmartin closed 10 months ago

carlosgmartin commented 1 year ago

This library contains implementations of

Gumbel MuZero (policy improvement)
Stochastic MuZero (chance nodes)

Potentially, these could be combined. (Example.) I was wondering if any consideration has been given to the idea of adding an implementation of this combination.

fidlej commented 1 year ago

Hi. Thanks for asking. Yes, it should be possible to combine the two algorithms. The Gumbel MuZero would be used for root nodes and Stochastic MuZero for interior nodes.

Currently, the Stochastic MuZero implementation is missing tests with an expected search tree. It would be nice to have such tests for any new implementation. E.g., recording a search tree from a 2048 game.

It would be also nice to move the Stochastic-MuZero-related policies to a new file: stochastic.py. Currently, I'm not working with 2048 (or another stochastic game). Well-tested contributions are welcome here.

carlosgmartin commented 1 year ago

@fidlej pgx and jumanji have JAX implementations of 2048 that could perhaps be used for this.

fidlej commented 10 months ago

Let's close this issue. The existing Stochastic MuZero implementation is not efficient inside mctx. An alternative library can be created for Stochastic MuZero.

puyuan1996 commented 7 months ago

Hello, thank you to the contributors for their outstanding work on this repository. Regarding the issue here, you might be interested in the project "LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios". This repository not only supports the AlphaZero algorithm but also extends support to MuZero and a series of related algorithms and environments (including StochasticMuZero and 2048), which might meet your requirements. Best wishes.