Generate database of samples for Breakthrough

arr28 commented 8 years ago

Pre-requisite for experiments under #373.

Build a database of states.

Mapping from state to score for 1st player. (It's fixed sum.)
Two types of node - either in different files or marked within a single file.
- Complete nodes (where the score is known for sure).
- Nodes where the current score is an MCTS estimate.
Write nodes at the point they're freed
- We know most about them at that point
- It's a simple single place (TreeNode.freeNode()) that's guaranteed to happen to all nodes
Nodes to include
- All complete nodes
- All other nodes with at least N samples (N to be decided)

What to do about terminal states? My experience with TTT is that it's important to include them in the training data. But does Sancho create a TreeNode for terminal states? If not, we could take terminal states from the end of rollouts, but might that flood the database? Perhaps a 3rd file for terminal states?

Aiming for ~1,000,000 states to start with. May need more in time, but that much data is probably already more than sufficient to make the NN training grind to a halt.

arr28 commented 8 years ago

I have ~220K states for starters, but haven't examined them in any way yet. Data format is array of...

Terminal? (1 byte boolean)
Complete? (1 byte boolean)
Num. visits (4 byte int)
Avg. score (8 byte double)
Number of longs, N, that make up the state (4 byte int)
- State bits (N * 8 byte longs)

I think that N=3 for all states in the database (but if the top bits were always missing from a state then it could theoretically be represented in fewer longs).

arr28 commented 8 years ago

Very skewed towards terminal samples. Also very skewed towards wins. I don't know how much of that's because Sancho was playing Random (and was therefore winning for most of the game) and how much is because of things like the anti-decisive loss code.

42 bytes per record.
Terminal: 139,642 of which 134,420 wins and 5,222 losses.
Complete: 78,378 of which 75,400 wins and 2,978 losses.
Estimate: 1,946

A new database generated by running Sancho vs Sancho for a single game with 5min moves produced...

42 bytes per record.
Terminal: 2,989,768 of which 2,218,270 wins and 771,498 losses.
Complete: 1,490,378 of which 1,094,785 wins and 395,593 losses.
Estimate: 8,899

arr28 commented 8 years ago

Proposition ordering...

cellHolds 1 1 black
cellHolds 1 1 white
cellHolds 1 2 black
cellHolds 1 2 white
cellHolds 1 3 black
cellHolds 1 3 white
cellHolds 1 4 black
cellHolds 1 4 white
cellHolds 1 5 black
cellHolds 1 5 white
cellHolds 1 6 black
cellHolds 1 6 white
cellHolds 1 7 black
cellHolds 1 7 white
cellHolds 1 8 black
cellHolds 1 8 white
cellHolds 2 1 black
cellHolds 2 1 white
cellHolds 2 2 black
cellHolds 2 2 white
...
cellHolds 8 7 black
cellHolds 8 7 white
cellHolds 8 8 black
cellHolds 8 8 white
control black
control white

The first co-ordinate is the column co-ordinate (i.e. it varies across the board). The second co-ordinate is the row co-ordinate (i.e. it varies up and down the board). White starts on rows 1 & 2. Black starts on rows 7 & 8. White plays first.

(1,8)  (2,8)  (3,8)  (4,8)  (5,8)  (6,8)  (7,8)  (8,8)
(1,7)  (2,7)  (3,7)  (4,7)  (5,7)  (6,7)  (7,7)  (8,7)
(1,6)  (2,6)  (3,6)  (4,6)  (5,6)  (6,6)  (7,6)  (8,6)
(1,5)  (2,5)  (3,5)  (4,5)  (5,5)  (6,5)  (7,5)  (8,5)
(1,4)  (2,4)  (3,4)  (4,4)  (5,4)  (6,4)  (7,4)  (8,4)
(1,3)  (2,3)  (3,3)  (4,3)  (5,3)  (6,3)  (7,3)  (8,3)
(1,2)  (2,2)  (3,2)  (4,2)  (5,2)  (6,2)  (7,2)  (8,2)
(1,1)  (2,1)  (3,1)  (4,1)  (5,1)  (6,1)  (7,1)  (8,1)

arr28 commented 8 years ago

Weights for pieces in the positions shown learned from sample DB by a NN with no hidden layer and a single sigmoid output neuron. (White playing up, Black playing down, weights are White on top, Black underneath.)


-0.11   -0.13    -0.15     7.27     7.06     7.09     7.69     7.24
 0.16    0.23     0.13    -0.14    -0.3     -0.27    -0.3     -0.18

-0.2    -0.14    -0.17     3.44     3.76     4.11     4.26     3.83
 0.13    0.18     0.11    -0.11    -0.11    -0.16    -0.19    -0.21

-0.28   -0.24    -0.15     1.27     0.69     0.44     0.97     0.46
 0.12    0.12     0.16     0.08    -0.12    -0.18    -0.25    -0.12

-0.42   -0.41    -0.3     -1.4      0.14     0.16     0.22     0.2
 0.13    0.15     0.1      0.1     -0.07    -0.18    -0.23    -0.18

-5.3    -4.81    -4.84    -5.07     0.12     0.19     0.17     0.14
 0.19    0.52     1.07     0.1     -0.08    -0.17    -0.19    -0.21

-7.81   -7.24    -6.44    -7.51     0.07     0.18     0.15     0.17
 0.15    7.69     7.32     7.42     1.32    -0.44    -1.44    -0.21

-0.21   -0.21    -0.43    -0.18     0.11     0.11     0.21     0.21
 0.08    4.04     3.88     3.43    -4.74    -4.29    -5.29    -5.1

 0.18   -0.11    -0.15    -0.16     0.08     0.34     0.32     0.3
-0.03    0.26     0.27     0.37    -7.22    -6.95    -6.51    -6.76

-0.11  (Control white)
-0.26  (Control black)

array([-0.57], dtype=float32)  (Bias?)

arr28 commented 8 years ago

Nothing like the left-right symmetry that there ought to be. Perhaps worth extending the dataset with left-right symmetry (and, if both positions already appear in the set, pick the one with more samples and use that in both positions).

SteveDraper commented 8 years ago

Should also be top-bottom with color reversal symmetry - can do similarly there.

arr28 commented 8 years ago

Yes, although it does produce lots of unreachable positions. Indeed, all positions where all 32 pawns are still on the board are obviously illegal when reversed like that (because black has had more moves than white - or they're equal but it's black's turn).

May not matter though.

arr28 commented 8 years ago

I suspect a large issue in my database is that it was produced from a single game (with max. length play clock). So the samples will be heavily biased in a fairly small area of the tree. I'm not sure if I've still got logs from the game, but I'm guessing that it'll show that white won towards the right.

Combining output from multiple runs has its own challenges though.

SanchoGGP / ggp-base

Generate database of samples for Breakthrough #383