Monte Carlo Tree Search :: AlphaGeese, AlphaZero .. etc

It would be nice to have an implementation of the MCTS in this codebase. In case it is not clear, I am referring to the https://tonghuikang.github.io/ai-project/report.pdf.

Page 16.. AlphaGeese [2] follows an implementation of MCTS [21] [22]. The implementation of MCTS tracks the following variables. • P(s, a, i) The prior probability agent i of taking an action a from a state s according to the neural network. This is the softmaxed value of the action-values inferred by the neural network. • N(s, a, i) The number of times we explore the action a taken by agent i from a state s when we are searching the tree. • Q(s, a, i) The expected reward for taking the action a by agent i from a state s. This is initialised with the state-value inferred by the neural network. Q(s, a, i) is the average of the state-values of the explored nodes in its subtree. The action with the highest upper confidence bound U(s, a) is explored.

DeNA / HandyRL

Monte Carlo Tree Search :: AlphaGeese, AlphaZero .. etc #356