MCTS Actor that utilizes the Winner predictor model

Monte Carlo uses random rollouts, where we play the game for a large number of turns until the game is over. The result of that is later used to determine how good an action was. The issue with this is that if we have 25 turns between an action taking place and the game ending, that score is not very accurate or useful. So perhaps using the Winner Predictor model as a rollout estimate can improve overall performance.

The goal

Implement Winner predictor model inference into MCTS as part of the rollout

Time tracking

Time Estimate: 2 hours 0 minutes Time spent: 1 hours 30 minutes

Resources

...

Klazkin / player-zero