google-deepmind / mctx

Monte Carlo tree search in JAX
Apache License 2.0
2.31k stars 188 forks source link

Question about `simulate` function in `search.py` #82

Closed wilrop closed 8 months ago

wilrop commented 8 months ago

Thanks for building this wonderful tool! I am currently going through the source code of search.py to try and understand how this is implemented in practice. I found myself wondering about the usage of the simulate() function.

As far as I understood, MCTS has four phases: selection, expansion, simulation and backpropagation. In the main search function, there is no call to a select() type function and the first function that is actually called is simulate(). When going into the code and comments for simulate(), however, I get the feeling that this is actually implementing the select phase as it is traversing the search tree until it encounters a node that it did not yet visit after which it is expanded by the expand() function.

My question is then: am I correct in understanding that the simulate() function is more akin to the selection phase? Moreover, given that this would imply that there is no real simulation phase, am I also correct in understanding that there is no simulation but rather a network is queried to obtain the value of a node without doing rollouts? I think this is what is happening in the code below. https://github.com/google-deepmind/mctx/blob/d40d32e1a18fb73030762bac33819f95fff9787c/mctx/_src/search.py#L226).

Any help to clarify this issue would be much appreciated!

fidlej commented 8 months ago

Sorry for the slow reply. You understand the code well. After the AlphaGo paper, the tree search started using a neural network instead of random rollouts.

If you want to know more context, you can check Section 2.3 "Policy improvement by search" from: https://discovery.ucl.ac.uk/id/eprint/10167022/2/ivo_danihelka_thesis.pdf