Open arr28 opened 8 years ago
I have ~220K states for starters, but haven't examined them in any way yet. Data format is array of...
I think that N=3 for all states in the database (but if the top bits were always missing from a state then it could theoretically be represented in fewer longs).
Very skewed towards terminal samples. Also very skewed towards wins. I don't know how much of that's because Sancho was playing Random (and was therefore winning for most of the game) and how much is because of things like the anti-decisive loss code.
42 bytes per record.
Terminal: 139,642 of which 134,420 wins and 5,222 losses.
Complete: 78,378 of which 75,400 wins and 2,978 losses.
Estimate: 1,946
A new database generated by running Sancho vs Sancho for a single game with 5min moves produced...
42 bytes per record.
Terminal: 2,989,768 of which 2,218,270 wins and 771,498 losses.
Complete: 1,490,378 of which 1,094,785 wins and 395,593 losses.
Estimate: 8,899
Proposition ordering...
cellHolds 1 1 black
cellHolds 1 1 white
cellHolds 1 2 black
cellHolds 1 2 white
cellHolds 1 3 black
cellHolds 1 3 white
cellHolds 1 4 black
cellHolds 1 4 white
cellHolds 1 5 black
cellHolds 1 5 white
cellHolds 1 6 black
cellHolds 1 6 white
cellHolds 1 7 black
cellHolds 1 7 white
cellHolds 1 8 black
cellHolds 1 8 white
cellHolds 2 1 black
cellHolds 2 1 white
cellHolds 2 2 black
cellHolds 2 2 white
...
cellHolds 8 7 black
cellHolds 8 7 white
cellHolds 8 8 black
cellHolds 8 8 white
control black
control white
The first co-ordinate is the column co-ordinate (i.e. it varies across the board). The second co-ordinate is the row co-ordinate (i.e. it varies up and down the board). White starts on rows 1 & 2. Black starts on rows 7 & 8. White plays first.
(1,8) (2,8) (3,8) (4,8) (5,8) (6,8) (7,8) (8,8)
(1,7) (2,7) (3,7) (4,7) (5,7) (6,7) (7,7) (8,7)
(1,6) (2,6) (3,6) (4,6) (5,6) (6,6) (7,6) (8,6)
(1,5) (2,5) (3,5) (4,5) (5,5) (6,5) (7,5) (8,5)
(1,4) (2,4) (3,4) (4,4) (5,4) (6,4) (7,4) (8,4)
(1,3) (2,3) (3,3) (4,3) (5,3) (6,3) (7,3) (8,3)
(1,2) (2,2) (3,2) (4,2) (5,2) (6,2) (7,2) (8,2)
(1,1) (2,1) (3,1) (4,1) (5,1) (6,1) (7,1) (8,1)
Weights for pieces in the positions shown learned from sample DB by a NN with no hidden layer and a single sigmoid output neuron. (White playing up, Black playing down, weights are White on top, Black underneath.)
-0.11 -0.13 -0.15 7.27 7.06 7.09 7.69 7.24
0.16 0.23 0.13 -0.14 -0.3 -0.27 -0.3 -0.18
-0.2 -0.14 -0.17 3.44 3.76 4.11 4.26 3.83
0.13 0.18 0.11 -0.11 -0.11 -0.16 -0.19 -0.21
-0.28 -0.24 -0.15 1.27 0.69 0.44 0.97 0.46
0.12 0.12 0.16 0.08 -0.12 -0.18 -0.25 -0.12
-0.42 -0.41 -0.3 -1.4 0.14 0.16 0.22 0.2
0.13 0.15 0.1 0.1 -0.07 -0.18 -0.23 -0.18
-5.3 -4.81 -4.84 -5.07 0.12 0.19 0.17 0.14
0.19 0.52 1.07 0.1 -0.08 -0.17 -0.19 -0.21
-7.81 -7.24 -6.44 -7.51 0.07 0.18 0.15 0.17
0.15 7.69 7.32 7.42 1.32 -0.44 -1.44 -0.21
-0.21 -0.21 -0.43 -0.18 0.11 0.11 0.21 0.21
0.08 4.04 3.88 3.43 -4.74 -4.29 -5.29 -5.1
0.18 -0.11 -0.15 -0.16 0.08 0.34 0.32 0.3
-0.03 0.26 0.27 0.37 -7.22 -6.95 -6.51 -6.76
-0.11 (Control white)
-0.26 (Control black)
array([-0.57], dtype=float32) (Bias?)
Nothing like the left-right symmetry that there ought to be. Perhaps worth extending the dataset with left-right symmetry (and, if both positions already appear in the set, pick the one with more samples and use that in both positions).
Should also be top-bottom with color reversal symmetry - can do similarly there.
Yes, although it does produce lots of unreachable positions. Indeed, all positions where all 32 pawns are still on the board are obviously illegal when reversed like that (because black has had more moves than white - or they're equal but it's black's turn).
May not matter though.
I suspect a large issue in my database is that it was produced from a single game (with max. length play clock). So the samples will be heavily biased in a fairly small area of the tree. I'm not sure if I've still got logs from the game, but I'm guessing that it'll show that white won towards the right.
Combining output from multiple runs has its own challenges though.
Pre-requisite for experiments under #373.
Build a database of states.
TreeNode.freeNode()
) that's guaranteed to happen to all nodesWhat to do about terminal states? My experience with TTT is that it's important to include them in the training data. But does Sancho create a
TreeNode
for terminal states? If not, we could take terminal states from the end of rollouts, but might that flood the database? Perhaps a 3rd file for terminal states?Aiming for ~1,000,000 states to start with. May need more in time, but that much data is probably already more than sufficient to make the NN training grind to a halt.