Monte Carlo uses random rollouts, where we play the game for a large number of turns until the game is over. The result of that is later used to determine how good an action was. The issue with this is that if we have 25 turns between an action taking place and the game ending, that score is not very accurate or useful. So perhaps using the Winner Predictor model as a rollout estimate can improve overall performance.
The goal
Implement Winner predictor model inference into MCTS as part of the rollout
Time tracking
Time Estimate: 2 hours 0 minutes
Time spent: 1 hours 30 minutes
Monte Carlo uses random rollouts, where we play the game for a large number of turns until the game is over. The result of that is later used to determine how good an action was. The issue with this is that if we have 25 turns between an action taking place and the game ending, that score is not very accurate or useful. So perhaps using the Winner Predictor model as a rollout estimate can improve overall performance.
The goal
Implement Winner predictor model inference into MCTS as part of the rollout
Time tracking
Time Estimate:
2 hours 0 minutes
Time spent:1 hours 30 minutes
Resources
...