Amend MCTS to support a vector of payoffs during back-propagation, with one value per player in the game. This can then be used to implement Max^N MCTS, in which each player makes a decision at their node using just their anticipated reward. See refs below.
Sturtevant, Nathan. 2008. ‘AN ANALYSIS OF UCT IN MULTI-PLAYER GAMES’. ICGA Journal, 14.
Sturtevant, Nathan, and Michael Bowling. 2006. ‘Robust Game Play against Unknown Opponents’. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems - AAMAS ’06, 713. Hakodate, Japan: ACM Press. https://doi.org/10.1145/1160633.1160761.
Amend MCTS to support a vector of payoffs during back-propagation, with one value per player in the game. This can then be used to implement Max^N MCTS, in which each player makes a decision at their node using just their anticipated reward. See refs below.
Sturtevant, Nathan. 2008. ‘AN ANALYSIS OF UCT IN MULTI-PLAYER GAMES’. ICGA Journal, 14.
Sturtevant, Nathan, and Michael Bowling. 2006. ‘Robust Game Play against Unknown Opponents’. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems - AAMAS ’06, 713. Hakodate, Japan: ACM Press. https://doi.org/10.1145/1160633.1160761.