SanchoGGP / ggp-base

The General Game Playing Base Package
8 stars 4 forks source link

Experiment with propagation-free MCTS #168

Open SteveDraper opened 10 years ago

SteveDraper commented 10 years ago

MCTS statistics are far from perfect under two circumstances: 1) When transition occur, because the child node can have been sampled far more than the number of times expected relative to a particular parent 2) When a child completes (non-decisively) it will typically undergo a discontinuous jump in its score, which will not reflect into the parent for some time thereafter

So, instead of propagating results up a particular path in the tree/graph just mark the selection path as dirty and on querying average score or num visits, calculate (and cache, marking non-dirty) suitable approximations from first principles given the values of those properties in the immediate children. If this can be pulled off (it's far from obvious in terms of the details) then any node will reflect maximal information (given the distribution of samples amongst children) regardless of transitions, and adjusting immediately to non-continuous value changes due to child completions.

arr28 commented 9 years ago

So, following our conversations, I think there are two independent things there.

  1. Firstly, and this is really the essence of this experiment, the value (& uncertainty) of a node is based solely on the value & uncertainty of its immediate children. This allows step-changes in the relevant cases (transitions + completion).
  2. There's a choice as to whether the tree is updated during a back-propagation step or whether we mark a dirty flag and then do all the updates later. My guess is that doing back-propagation (albeit not an MCTS back-propagation) is marginally better. Otherwise, we'll just end up marking nodes as dirty all the way up to the root and the very next select will have to walk down the tree and resolve all the dirty flags. Having said that, if we used the dirty flag approach and batched up back-props, then we would avoid some wasted computation higher up the tree.

And thinking about this, we also need to consider how "virtual losses" will work in the new scheme (i.e. how we'll avoid all threads working in the same area of the tree).