GAIGResearch / TabletopGames

MIT License
78 stars 55 forks source link

Max MCTS #283

Closed hopshackle closed 3 months ago

hopshackle commented 3 months ago

1) Added Max-backups for MCTS. This uses the new MCTSParameters setting of maxBackupThreshold. If the visits to a node, N, exceeds this value (the default is 1 million, so 'off') then the backup will weight the actual received reward (weight = threshold) and the expected value of the best action that could have been taken instead (weight = N - threshold). This only happens if the cation taken was not the bext expected action. This should be helpful in games with a larger number of poor actions, as exploration of these will not cause us to be pessimistic further up the tree. (The risk instead is of careless optimism, so this is not universally helpful.)

2) Regret Matching and EXP3 selection rules now use the correct rule when picking the final action after search. For RM this uses a policy averaged over all MCTS iterations, and for EXP3 this uses the current selection policy with no exploration.

3) Hedge selection rule removed as it was causing too many numerical instability issues. (Use RegretMatching instead)

4) Fix for MAST being a CPU-hog unnecessarily with small MCTS budgets and MASTGamma=0.

5) A new NTBEA mode has been added in ParameterSearch. 'StableNTBEA'. The default NTBEA uses one game for each parameter test. StableNTBEA is suitable for games with a high level of impact from the random seed (e.g. Poker, Seven Wonders, Sushi Go). Instead of a single game it runs P games (where P is the number of player); each of these uses the same random seed and differs only in the position the parameterised agent plays in. This hugely reduces variance. (It will not help deterministic games...and will reduce the effective sample size for these by a factor of P.)