Sancho doesn't play optimally in solved games

SteveDraper commented 10 years ago

If the search shows the game to be a fully explored draw (or any fully explored result in fact), then Sancho will essentially pick randomly from all complete nodes that lead to that score. This does not maximize the opponent's chance to make a mistake, and shows up dramatically when playing games like TicTacToe against Random, where it tends to draw far more often than it should given all the mistakes it is given by its opponent.

This issue covers...

Giving the opponent the maximum opportunity to go wrong in games that look like a forced loss or forced draw for us.
Taking the fastest win. This is helpful in all games (giving us less time to have network problems, etc.), but vital in factorized games.

arr28 commented 10 years ago

A proposal, for 2-player, alternating play, win/draw/lose games...

Don't discard completed sub-trees.
- For "our move" nodes, we can discard the moves we won't choose - see below.
- For "their move" nodes, we'll always keep them all.
Keep win/draw/loss(*) counts in each node.
- A terminal node has one of these values set to 1 and the others to 0.
- A parent has the sum of its children.
- (*) There's no need to store the loss count - it just made the explanation easier
For choosing the path...
- If a strong win for us, pick the shortest path to win.
- If a "forced draw", pick the node with a highest win count
- If a "forced loss", pick the node with the highest "win + (0.5 * draw)" value. (Other weightings of win vs draw also acceptable.)

This maximises the number of opportunities for the opponent to make a mistake. It should lead to "perfect" play in TTT.

SteveDraper commented 10 years ago

This sounds like a very plausible starting point which will either 'just work' or form the basis as a jump off point for further experimentation.

On Wed, Mar 19, 2014 at 5:11 AM, Andrew Rose notifications@github.comwrote:

A proposal, for 2-player, alternating play, win/draw/lose games...

Don't discard completed sub-trees.

For "our move" nodes, we can discard the moves we won't choose - see below.

For "their move" nodes, we'll always keep them all.

Keep win/draw/loss(*) counts in each node.

A terminal node has one of these values set to 1 and the others to 0.

A parent has the sum of its children.

(*) There's no need to store the loss count - it just made the explanation easier

For choosing the path...

If a strong win for us, pick any node that maintains the strong win.

If a "forced draw", pick the node with a highest win count

If a "forced loss", pick the node with the highest "win + (0.5 * draw)" value. (Other weightings of win vs draw also acceptable.)

This maximises the number of opportunities for the opponent to make a mistake. It should lead to "perfect" play in TTThttp://upload.wikimedia.org/wikipedia/commons/d/de/Tictactoe-X.svg .

Reply to this email directly or view it on GitHubhttps://github.com/SteveDraper/ggp-base/issues/7#issuecomment-38034506 .

arr28 commented 10 years ago

In a single-player game it's less important, but note this poor example of Chinese Checkers: http://www.ggp.org/view/tiltyard/matches/b7e90e1c8d0c31bdcc5742e77a8b931dbde3b768/

SteveDraper commented 10 years ago

Partially addressed this by implementing preferential selection of shortest win paths, but more to do before this can really be closed

arr28 commented 10 years ago

A good step in the right direction, but I agree with your assessment. I'm keen not to close this issue until (at least) we really do play perfectly in TTT (see link above). If you have other priorities, I'm very happy for this issue to lie untouched for a while and I'm equally happy to do the work (although I wouldn't jump on it straight away because there are other things on my list).

SteveDraper commented 10 years ago

Implemented part of this, which is to not remove children from complete nodes unconditionally, instead keeping the best choice (if it's our decision) or all (if someone else's). This prevents solution paths to wins being 'forgotten' as happens on occasion.

SanchoGGP / ggp-base

Sancho doesn't play optimally in solved games #7