Sometimes not converging complete wins nearly as fast as expected

SteveDraper commented 9 years ago

See http://www.ggp.org/view/tiltyard/matches/0985e2b3b9f74b390dafc3b3ac2a1331a2dddf74/ - when we played ( place b 6 ) on move 22 it had over 2000 visits and yet was STILL not complete, despite every response leading to a 1 move win on our next move.

Turned out to be because it was not evaluating terminality at node expansion time. This in turn was because the meta-game assessment of the minimum length of a game was around 26 moves, so it was skipping terminality assessment on nodes shallower than that (until they themselves are expanded).

In large branching factor games where win paths are not dense, over-estimation of minimum length is likely and can be substantial.

For now I have adopted a simple mechanism to address this (low risk prior to IGGP2015) as follows:

1) When processing a playout result calculate the depth it terminated at. If this is lower than the lowest up-until-now believed possible record the new lower limit (on an MCTSTree member variable)

2) At the start of a node expansion, evaluate child terminality if the depth is >= the lowest currently seen termination depth

3) At node expansion evaluate THIS node's terminality (necessary if the parent expansion did NOT) if the parent's depth is lower than the INITIAL (from meta-gaming) estimate of minimum depth. This is necessary because the current minimum estimate could have been obtained in between the expansion of the parent and that of the child. It can lead to re-evaluation (redundantly) of the child terminality when the child is expanded, but this is anyway cheap as we're about to run that state through the state machine anyway to obtains the legals.

SteveDraper commented 9 years ago

Leaving open, because we can (I think) do MUCH better (though not for IGGP2015).

In outline:

Whether to assess terminality of child states when we expand a node is a trade-off between (significant) extra expense (doing this means processing the child states through the state machine to determine if they are terminal), and reduction of search (essentially by noticing wins a ply earlier).

The balance favors early termination checks, but we still want to avoid the overhead if we are fairly sure the children cannot be terminal. Currently the global shallowest-observed-terminal depth is used to make this determination, but it's a VERY blunt instrument. Instead I propose the following:

1) Add to TreeNode a minimumDescendentTerminalDepth member 2) As now, observe actual playout depths, and propagate the observed minimum up the tree 3) On selection carry the value from the last node selected through with more than some threshold visit count down and use THAT value as the threshold on the new expansion

Benefits: i) No dependency on meta-gaming analysis and its varying accuracy in different game types ii) In tree branches that go deep but without going near a terminal (e.g. - most branches in chess!) we don't take the hit of child terminality determination on expansion, except for nodes that are actually 'near' terminal states

arr28 commented 9 years ago

Proposed post-IGGPC15 approach seems fine in general. In (3), consider whether this threshold needs to be dependent on the branching factor.

SanchoGGP / ggp-base

Sometimes not converging complete wins nearly as fast as expected #335