SanchoGGP / ggp-base

The General Game Playing Base Package
8 stars 4 forks source link

Consider only expanding children after parent has multiple visits #340

Open arr28 opened 9 years ago

arr28 commented 9 years ago

Whilst reading the literature, I gather that there's some mileage in not doing the expand step until the parent has had at least N visits (with N=4 being recommended for Go).

Especially in games where we expand all the children simultaneously, this stands to give us a reasonable perf. boost for likely little impact on quality. Therefore, a reasonable chance of an overall positive impact.

SteveDraper commented 9 years ago

I am suspicious of how generally useful this is. Surely it is exactly equivalent to a sample size of N (that is N playouts per expand)? Empirically we have already demonstrated that an MCTS expansion is worth considerably more than an extra rollout.

However, there is another technique called progress widening, which plays into the same area. With this technique you create (and select between) a subset of the children, adding some each time through rather than expanding all at once.

On balance I think however, that both of these will become very marginal once we move expansion to the rollout threads anyway.

arr28 commented 9 years ago

Surely it is exactly equivalent to a sample size of N?

I don't think so. Especially when exploring fairly unpromising branches, there will be many cases that a single (or small number) of losses will be enough to stop the branch getting sampled again, at least for a long time.

Unlike a sample size of N, we don't suffer the cost of doing N rollouts. Furthermore, by suppressing creation of the first child, we avoid creation of all the siblings.

In essence, this approach is designed to stop us from expanding areas of the tree that are unlikely to be revisited.

SteveDraper commented 9 years ago

Good point. Worth experimenting with (should be fairly easy).

SteveDraper commented 9 years ago

One thing to worry about here is pipeline latency. The problem is that we have an inherently asynchronous architecture where rollouts occur after a pipeline delay, so we do not get the results from the initial rollouts before we have to decide on a selection path through the node potentially. If we impose a rule that says no expansion before the node has N updates (note - updates not visits) then we might actually end up with needing considerably more than N visits before expansion actually can take place. Conversely, if we allow expansion after N visits (visits this time not updates), then we may well be making the selection based on fewer than N rollouts (could even be none), so the putative benefit would be degraded (since we'd end up expanding even if the node turns out to get 4 rollouts all scoring 0 say). Statistically this may be rare enough that it doesn't matter, but it needs to be born in mind. This will be especially true given the selection-caching code that continues to select through the same node until it gets updated (that we still don't understand as to why it is beneficial, but so far have not found an alternative that works as well)

On Thu, Jul 16, 2015 at 8:45 AM, Steve Draper steve.p.draper@gmail.com wrote:

Good point. Worth experimenting with (should be fairly easy)

On Thu, Jul 16, 2015 at 8:06 AM, Andrew Rose notifications@github.com wrote:

Surely it is exactly equivalent to a sample size of N?

I don't think so. Especially when exploring fairly unpromising branches, there will be many cases that a single (or small number) of losses will be enough to stop the branch getting sampled again, at least for a long time.

Unlike a sample size of N, we don't suffer the cost of doing N rollouts. Furthermore, by suppressing creation of the first child, we avoid creation of all the siblings.

In essence, this approach is designed to stop us from expanding areas of the tree that are unlikely to be revisited.

— Reply to this email directly or view it on GitHub https://github.com/SanchoGGP/ggp-base/issues/340#issuecomment-121950631 .

SteveDraper commented 9 years ago

Experiments showed that this was quite harmful in Connect4, but subsequently it is looking very promising in Majorities.

I think there are two factors at play: 1) It gives the most benefit in pure performance terms in high branching factor games 2) It is harmful in games where terminality is common all over the state space, because delaying expansion delays child terminality detection, which it is highly beneficial to detect early as it allows significant pruning

Assuming the positive results hold up in Majorities over a longer soak test I think looking at how to conditionally enable (or tune N) is going to be highly worthwhile

SteveDraper commented 9 years ago

Repeating this information here as this thread is a better place fro it really:

Expansion deferral is proving most interesting:

1) With fast play times (8 seconds) deferring until 4 visits yields a 4:1 win ratio for the deferring player in Majorities 2) With slower play times (15 seconds) no obvious benefit is apparent in Majorities 3) In Hex a 4 visit deferral yields significant benefits (circa 2:1) at 15 second play times 4) The benefit to Majorities (just over 2:1 now) is recovered with 15 second play times if the deferral threshold is increased to 8

There is definitely something here, but tuning it is going to be 'interesting'!

SteveDraper commented 9 years ago

Added some fairly constrained usage and tuning for IGGP2015 that doesn't try to use it at all for anything with an average branching factor below 20, and which tunes the value according to the observed terminal density (proxied by greedy rollout effectiveness which is a decent enough measure for now), with tuning constants set to give a value of 8 for Majorities based on empirical testing. Also capped at 8 so that any new games that turn out to have extreme values can never set an extreme threshold.

The effect gives: Hex: 6 (tested as superior to not using it at all) Majorities: 8 (tested as superior) Breakthrough: 4 (side-effect really but tested as non-harmful) Amazons: 8 (unknown impact but not terribly relevant for IGGP2015)

Most other games not used