glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
760 stars 299 forks source link

Analyze blunders #558

Open mooskagh opened 6 years ago

mooskagh commented 6 years ago

Important!

When reporting positions to analyze, please use the following form. It makes it easier to see what's problematic with the position:

(old text below)

There are many reports on forums asking about blunders, and the answers so far had been something along the lines "it's fine, it will learn eventually, we don't know exactly why it happens".

I think at this point it makes sense to actually look into them to confirm that there no some blind spots in training. For that we need to:

Eventually all of this would be nice to have as a single command, but we can start manually.

For lc0, that can be done this way: --verbose-move-stats -t 1 --minibatch-size=1 --no-smart-pruning (unless you want to debug specifically with other settings).

Then run UCI interface, do command:

position startpos moves e2e4 ....

(PGN move to UCI notation can be converted using pgn-extract -Wuci)

Then do:

go nodes 10

see results, add some more nodes by running:

go nodes 20
go nodes 100
go nodes 800
go nodes 5000
go nodes 10000
and so on

And look how counters change.

Counters:

e2e4 N: 329 (+ 4) (V: -12.34%) (P:38.12%) (Q: -0.2325) (U: 0.2394) (Q+U: 0.0069)
 ^      ^    ^      ^           ^          ^            ^           ^
 |      |    |      |           |          |            |           Q+U, see below
 |      |    |      |           |          |           U from PUCT formula,
 |      |    |      |           |          |           see below.
 |      |    |      |           |         Average value of V in a subtree
 |      |    |      |          Probability of this move, from NN, but if Dirichlet
 |      |    |      |          node is on, it's also added here, 0%..100%
 |      |    |     Expected outcome for this position, directly from NN, -100%..100%
 |      |   How many visits are processed by other threads when this is printed.
 |     Number of visits. The move with maximum visits is chosen for play.
Move

* U = P * Cpuct * sqrt(sum of N of all moves) / (N + 1)
  CPuct is a search parameter, can be changed with a command line flag.
* The move with largest Q+U will be visited next

Help wanted:

ASilver commented 6 years ago

As reported in the Discord channel, in this game, on move 138, NN303 hangs a rook out of the blue costing the drawn game almost instantly. The next move the eval had swung +9.42:

[Event "DESKTOP-RV5DCNB, Blitz 1m+1s"] [Site "Rio de Janeiro, Brazil"] [Date "2018.05.18"] [Round "18"] [White "Spike 1.4"] [Black "lczero v0.10"] [Result "1-0"] [ECO "D94"] [Annotator "0.09;0.01"] [PlyCount "283"] [TimeControl "60+1"]

{Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz 3293 MHz W=22.2 plies; 4,701kN/s B=17.1 plies; 1kN/s; 45 TBAs} 1. c4 c5 2. Nf3 Nf6 3. Nc3 e6 4. e3 Nc6 5. d4 { Both last book move} d5 {0.01/18 2} 6. a3 {0.09/16 3 (cxd5)} cxd4 {0.06/18 2 (a6)} 7. exd4 {0.17/16 2} g6 {-0.02/19 2 (Be7)} 8. Bd3 {0.59/16 2 (cxd5)} Bg7 { -0.13/19 2 (dxc4)} 9. O-O {0.59/16 2} dxc4 {-0.12/19 2} 10. Bxc4 {0.42/18 12} O-O {-0.16/19 1} 11. Bg5 {0.47/18 3 (Re1)} h6 {-0.34/18 1} 12. Be3 {0.47/17 3 (Bh4)} b6 {-0.34/18 2 (Nd5)} 13. Qe2 {0.44/14 2} Bb7 {-0.27/19 2} 14. Rfd1 { 0.36/14 2 (Rad1)} Ne7 {-0.29/18 2} 15. Ba6 {0.34/14 2 (Bd3)} Bxa6 {-0.31/18 2}

  1. Qxa6 {0.14/15 3} Nfd5 {-0.32/19 2 (Qc8)} 17. Rac1 {0.09/16 3 (Bd2)} Nxe3 { -0.43/18 2 (Nf5)} 18. fxe3 {0.45/15 4} Nf5 {-0.42/18 1 (g5)} 19. Qd3 {0.40/15 2 (Qe2)} Rc8 {-0.49/19 2} 20. Rc2 {0.28/15 2 (e4)} Rc7 {-0.60/19 2 (g5)} 21. Kh1 {0.43/13 2 (Re2)} h5 {-0.71/19 2 (Nd6)} 22. Qe2 {0.22/13 2 (Rcd2)} Nh4 { -0.62/19 3 (Qb8)} 23. Rdc1 {0.15/15 2 (Nxh4)} Nf5 {-0.79/19 3} 24. Qe1 { 0.09/15 2 (Rd1)} Bh6 {-0.90/19 3} 25. Re2 {0.08/16 3} g5 {-0.66/19 2 (Bg7)} 26. Rd1 {0.00/15 2} g4 {-0.69/18 1} 27. Ne5 {0.13/17 4} Bg7 {-0.69/19 3 (Qg5)} 28. Nb5 {0.02/15 2 (Ne4)} Bxe5 {-1.11/18 2} 29. dxe5 {0.00/19 2} Rd7 {-1.06/18 1}
  2. Nd6 {0.00/20 1 (Rxd7)} Nxd6 {-0.79/19 3 (Qg5)} 31. Red2 {0.02/18 2} Qg5 { -0.61/18 1} 32. exd6 {0.00/19 2} Rfd8 {-0.65/20 3 (Qe5)} 33. Rd3 {0.08/19 2 (h3)} Qe5 {-0.87/18 2 (Qc5)} 34. Qb4 {0.02/17 2 (Qd2)} f6 {-0.83/19 3} 35. Qc4 {0.06/17 2 (Kg1)} f5 {-0.84/19 3 (b5)} 36. Qf4 {0.00/18 1} Qxf4 {-0.85/17 1}
  3. exf4 {0.00/23 1} Kf7 {-0.91/19 2} 38. Kg1 {0.00/24 1} Kf6 {-0.92/19 3} 39. g3 {0.00/25 2 (Kf2)} e5 {-0.82/17 2} 40. fxe5+ {0.00/25 2} Kxe5 {-0.82/18 1}
  4. Re3+ {0.00/24 1} Kf6 {-0.75/17 0} 42. Red3 {0.00/27 2} Ke6 {-0.66/19 3 (Ke5)} 43. Re1+ {0.00/29 2 (Re3+)} Kf6 {-0.64/17 2} 44. Red1 {0.00/10 0} Rc8 { -0.63/19 3 (Ke5)} 45. R1d2 {0.00/22 1 (Rf1)} Ke6 {-0.57/18 3 (Rc4)} 46. Re3+ { 0.00/23 1 (Kf2)} Kf6 {-0.64/17 2} 47. Kf2 {0.00/25 1} Rcd8 {-0.76/18 1 (Rc6)}
  5. Red3 {0.00/26 1} Rh8 {-0.85/18 1 (Ke6)} 49. a4 {0.00/22 1 (Rd5)} h4 { -1.30/17 2 (Rhd8)} 50. Kg2 {0.00/20 1} h3+ {-1.28/17 2 (hxg3)} 51. Kf2 { 0.00/26 1} Rhd8 {-1.34/18 2} 52. b3 {0.00/29 1 (b4)} Rc8 {-1.45/18 3 (Ke5)} 53. Ke2 {0.00/28 1 (Ke3)} Rc5 {-1.54/18 3 (Re8+)} 54. Kf2 {0.00/28 1} Rc1 {-1.53/ 17 2 (Rc8)} 55. Ke3 {0.00/26 1} Rf1 {-1.54/18 2 (Re1+)} 56. Ke2 {0.00/26 1} Rh1 {-1.50/17 1} 57. Ke3 {0.00/28 2} Rb1 {-1.39/18 2 (Rf1)} 58. Ke2 {0.00/27 1 (Kf2)} Rh1 {-1.28/17 2} 59. Ke3 {0.00/10 0} a6 {-1.25/18 2 (Re1+)} 60. Rc3 { 0.00/22 1 (Rc2)} Rb1 {-1.38/17 1 (Re1+)} 61. Kf2 {0.00/24 1 (Rcd3)} Rh1 { -1.27/18 2} 62. Ke3 {0.00/10 0} Re1+ {-1.08/18 2} 63. Kf2 {0.00/30 1} Rb1 { -1.14/17 1 (Rh1)} 64. Rcd3 {0.00/25 1 (Ke2)} Rh1 {-1.31/17 2 (Ke6)} 65. Ke3 { 0.00/27 1} Rb1 {-1.26/18 2 (Re1+)} 66. Ke2 {0.00/26 1 (Kf2)} Rh1 {-1.30/17 2}
  6. Ke3 {0.00/10 0} Rg1 {-1.08/17 1 (Re1+)} 68. Rd5 {0.00/23 1} Rb1 {-1.26/16 1 (Re1+)} 69. R5d3 {0.00/25 1} a5 {-1.32/17 2 (Rg1)} 70. Ke2 {0.00/25 1 (Rc3)} Rh1 {-1.39/16 1} 71. Ke3 {0.00/27 1} Re1+ {-1.22/16 1 (Rb1)} 72. Kf2 {0.00/32 1 } Rh1 {-0.98/16 1} 73. Ke3 {0.00/10 0} Rg1 {-0.87/17 1 (Re1+)} 74. Rd5 { 0.00/24 1} Re1+ {-0.82/16 1 (Rb1)} 75. Kf2 {0.00/26 1} Re5 {-1.03/16 1 (Rh1)}
  7. R5d4 {0.00/26 1} Rc5 {-0.94/17 1 (Ke6)} 77. R4d3 {0.00/28 1} Rc8 {-1.02/17 2 (Rc1)} 78. Ke2 {0.00/29 1 (Ke3)} Re8+ {-0.96/16 1 (Rc5)} 79. Kf2 {0.00/33 1} Rc8 {-0.90/16 1} 80. Re3 {0.00/28 1 (Ke3)} Rc6 {-1.04/15 1} 81. Red3 {0.00/31 1 } Ke6 {-0.91/16 1 (Rc8)} 82. Re3+ {0.00/31 1} Kf6 {-0.74/16 1} 83. Red3 { 0.00/13 0} Rc5 {-0.74/16 1 (Rc8)} 84. Re3 {0.00/29 1 (Re2)} Rc8 {-0.71/16 1 (Rc6)} 85. Ke2 {0.00/27 1 (Red3)} Rc6 {-1.16/15 1 (Rcd8)} 86. Red3 {0.00/28 1} Ke6 {-0.95/17 2 (Rc8)} 87. Re3+ {0.00/31 1} Kf6 {-0.82/17 1} 88. Red3 {0.00/13 0} Rc8 {-0.66/17 1} 89. Re3 {0.00/12 0 (Ke3)} Rcd8 {-0.74/16 1} 90. Red3 { 0.00/30 1} Ke5 {-0.58/17 1 (Re8+)} 91. Re3+ {0.00/10 0} Kf6 {-0.47/14 0} 92. Red3 {0.00/10 0} Re8+ {-0.38/17 2} 93. Kd1 {0.00/31 1 (Kf2)} Re5 {-0.85/17 1 (Red8)} 94. Rc2 {0.00/25 1 (Kc2)} Ke6 {-1.32/16 1} 95. Rcd2 {0.00/27 1} b5 { -1.37/16 1 (Kf6)} 96. axb5 {0.00/25 1} Rxb5 {-1.30/16 1} 97. Ke1 {0.00/26 1 (Kc2)} Re5+ {-1.20/17 1 (Rb6)} 98. Kf2 {0.00/28 1} Rb5 {-1.00/16 1} 99. Re3+ { 0.00/29 1 (Re2+)} Kf6 {-1.03/16 1 (Re5)} 100. Rd1 {0.00/24 1 (Red3)} Rc5 { -1.23/16 1} 101. Rd2 {0.00/28 1} Rc1 {-1.13/17 1 (Rb5)} 102. Red3 {0.00/27 1} Rh1 {-1.09/17 1 (Rc5)} 103. Ke3 {0.00/28 1} Rf1 {-0.94/17 1 (Re1+)} 104. Ke2 { 0.00/26 1} Rh1 {-0.80/16 1} 105. Ke3 {0.00/10 0} Re1+ {-0.70/17 2} 106. Kf2 { 0.00/34 1} Ra1 {-0.72/16 1 (Rh1)} 107. Rd5 {0.00/23 1 (Ke2)} Ke6 {-0.87/16 1}
  8. R5d3 {0.00/26 1 (Ke3)} Rh1 {-1.09/16 1 (Kf6)} 109. Ke3 {0.00/27 1} Re1+ { -1.05/17 1} 110. Kf2 {0.00/28 1} Rh1 {-1.01/17 1 (Re5)} 111. Ke3 {0.00/31 1} Re1+ {-0.88/17 1} 112. Kf2 {0.00/10 0} Re5 {-0.92/17 1} 113. Rd4 {0.00/28 1 (Kg1)} Rb5 {-1.03/16 1 (Rc5)} 114. Re2+ {0.00/28 1 (R4d3)} Kf6 {-1.16/16 1 (Re5)} 115. Rd3 {0.00/27 1} Rb6 {-1.08/16 1 (Re5)} 116. Red2 {0.00/28 1} Rb5 { -1.01/17 1 (Ke6)} 117. Rd1 {0.00/26 1 (Rd5)} Rb8 {-1.00/16 1 (Rc5)} 118. Re3 { 0.00/26 1 (Ra1)} Rb6 {-1.10/17 2} 119. Red3 {0.00/29 1} Rc6 {-1.07/17 1 (Rb8)}
  9. R1d2 {0.00/29 1 (Ke3)} Rb6 {-0.97/16 1 (Rc5)} 121. Kg1 {0.00/29 1 (Rd5)} Ke6 {-1.17/16 1} 122. Re3+ {0.00/30 2} Kf6 {-0.96/16 1} 123. Red3 {0.00/10 0} Rb8 {-0.82/17 1 (Ke6)} 124. Kf2 {0.00/31 1} Rb5 {-0.76/16 1 (Rb6)} 125. Rc3 { 0.00/29 1 (Kg1)} Rb6 {-0.73/16 1 (Ke6)} 126. Rcd3 {0.00/10 0} Rc6 {-0.56/16 1 (Ke6)} 127. Ke3 {0.00/26 2 (Rd5)} Rc5 {-0.54/16 1 (Rb6)} 128. Kd4 {0.00/25 1 (Kf4)} Rc6 {-0.44/16 1} 129. Ke3 {0.00/10 0} Rc1 {-0.56/17 1 (Rb6)} 130. Ke2 { 0.00/27 1 (Kf2)} Rg1 {-0.72/16 1 (Rh1)} 131. Ke3 {0.00/28 1} Rf1 {-0.65/17 2 (Re1+)} 132. Ke2 {0.00/26 1} Rg1 {-0.53/16 1} 133. Ke3 {0.00/10 0} Rh1 { -0.60/17 1 (Re1+)} 134. Rd5 {0.00/25 1} Rb1 {-0.50/16 1} 135. R5d3 {0.00/27 2 (Rb5)} Kg5 {-0.61/16 1 (Re1+)} 136. Rc2 {0.00/21 2 (Ra2)} Re1+ {-0.51/16 1 (Kf6)} 137. Kd2 {0.00/22 1 (Kf2)} Rh1 {-0.26/16 1 (Re6)} 138. Kc3 {1.20/16 2} Rc1 {-0.12/16 1 (Rg1)} 139. Rxc1 {9.30/17 1 (Rd5)} Kf6 {12.90/17 2} 140. Ra1 { 11.08/18 2 (Re1)} Ke6 {12.96/16 1 (Kg5)} 141. Rxa5 {13.27/18 1 (Re1+)} Rd8 { 13.73/16 1 (Kf6)} 142. d7 {19.37/18 2 (Re3+)} 1-0
dstark1993 commented 6 years ago

Self play No 14200622. http://lczero.org/game/14200622 or with game analysis https://lichess.org/study/Smk8bomB/5TpAhSFw

  1. bxc6 qe8?? Blunder. Best move was Bf5. Evaluation from +2.5 to +9.5

capture

Network: 1831f4884d6da86fe369d4a51fbfe6a433703fbf97b0e7122898170c1eede5e0 ID329

Its just a game from generated self play using googles colab Leela_K80 Looks like the queen would move to get out of discovery....

dstark1993 commented 6 years ago

Self play No 14215228. http://lczero.org/game/14215228 or with game analysis https://lichess.org/study/Smk8bomB/noxTVxVv

  1. ra7 d2?! Not the best checkmate sequence. Best move was qg2# Missed mate in 1 for some reason.

capture

Network: 1831f4884d6da86fe369d4a51fbfe6a433703fbf97b0e7122898170c1eede5e0 ID329

Its just a game from generated self play using googles colab Leela_K80 Mate in 1 is missed.

dstark1993 commented 6 years ago

Self play No 14217918. http://lczero.org/game/14217918 or with game analysis https://lichess.org/study/Smk8bomB/RS645It6

  1. rxf7 qe7?? Checkmate is now unavoidable. Best move was rc7. Gave up queen for no reason? Evaluation from -1 to mate in 6.

capture

Network: 1831f4884d6da86fe369d4a51fbfe6a433703fbf97b0e7122898170c1eede5e0 ID329

Its just a game from generated self play using googles colab Leela_K80 Gave up queen for no reason.

so-much-meta commented 6 years ago

@dstark1993 -- Self play games (for training data generation) have randomized moves. Although interesting to see some examples of this randomization, it's not really what this issue is about.

dstark1993 commented 6 years ago

How can you tell if it was randomized move or made by leela "on purpose"? its something that can be checked?

so-much-meta commented 6 years ago

@dstark1993 - yeah, you could check this by running the engine outside of training, being sure to provide some history, and seeing what the engine does.... However, that's a little tedious... I'm going to be trying to figure out some good ways to analyze the training data in a more automated way.

dstark1993 commented 6 years ago

Good, cause i dont understand very much in programming, im generating games with google colab. Dont think ill be able to use lc0 on my laptop (power and/or my skill for figuring how to)

chara1ampos commented 6 years ago

Id: Loss of Queen Game: https://lichess.org/fzuBrqhf#72 Bad move: 36. Re5, Qxc2, (Stockfish eval goes from -0.2 to +7.3) Correct move: Qc6 Configuration: lc0 cudnn - May 19 (default parameters), Windows 10 x64, Nvidia Titan V, Intel i5-7400T quad core, 32 GB RAM Network ID: kb1-256x20-2100000.txt.bz2 Time control: 60" + 2" Comments: Leela was ahead but blundered, losing a Queen, and then resigned.

steve3140 commented 6 years ago

That last one "Loss of Queen" is worth monitoring a bit I think. Its another multi-move blunder with the theme again being "removal of defender". Am hoping to see Leela gradually work these oversights out of her system.

haleysa commented 6 years ago

The "Loss of Queen" removal of defender with check isn't actually too bad - at 10k nodes ID 329 Leela won't play the blunder Qxc2, and while the policy for Qxc2 is N 65.89%, the listed Best Move is the 3rd highest policy at 3.93%. Interestingly Leela actually prefers Rb8(N 2.21%) at 10k nodes, which isn't a blunder, and Stockfish says the position is even after that.
After Qxc2, the policy for the refutation Re8+ is 2.17% and is preferred at 5k nodes search. Rxg5 is also good for white, but not as good (QN v RR for white), and is N 3.78%. So there's lots of ways for Leela to get out of this.

chara1ampos commented 6 years ago

Id: Loss of Knight Game: https://lichess.org/pJh8VCTA#89 Bad move: 45. Rxb5 (Stockfish eval goes from -1 to -5.1) Correct move: f3 Configuration: lc0.exe cudnn - May 22 (default parameters), Windows 10 x64, Nvidia Titan V, Intel i5-7400T quad core, 32 GB RAM Network ID: 330 Time control: 60" + 2" Comments: Leela blundered, lost a piece, and then resigned.

chara1ampos commented 6 years ago

Id: Leela blunders, steps into checkmate in #14 Game: [https://lichess.org/qNgs7Cwg#95]https://lichess.org/qNgs7Cwg#95) Bad move: 48. Qc6, Bg3 (Stockfish eval goes from +4 to +13) Correct move: Rde8 Configuration: lc0.exe cudnn - May 22 (default parameters), Windows 10 x64, Nvidia Titan V, Intel i5-7400T quad core, 32 GB RAM Network ID: 330 Time control: 60" + 2" Comments: Leela blundered, faced checkmate in #14, and resigned.

so-much-meta commented 6 years ago

I've been posting on the forums about some work I'm doing to try to automate blunder detection. Please see here: https://groups.google.com/forum/#!topic/lczero/8lK5ldgZUHA

So far, this does seem to work to find things like buggy (old) engines playing match games, as well as blind spots (missed mate in ones, etc)... However, I haven't fully figured out exactly how I'd like to measure things to get a reliable signal. Once I improve this, I hope to apply this to new match games as they come out to look for bugs and blunders.

haleysa commented 6 years ago

I've actually spent a little time today doing something similar with the PGNs from the CCLS gauntlets. It's easier in a lot of ways because the opposing engine will spot the blunder, so I've started just scanning for situations where Leela's eval drops by a certain threshold (right now -200 centipawns) - this means she made a move and didn't see the refutation that was coming from the other engine. I'm also filtering it for where Leela's eval didn't start worse than -2.00, because if she's already losing and blunders more it's not so interesting in my opinion. A lot of that helped avoid "losing faster" endgame moves from cluttering it. It's still in its infancy and won't be worked on this weekend, but hopefully it can be converted into generating some automatic tracking positions for future IDs as another way to check progress.

so-much-meta commented 6 years ago

Using that method, here's the worst I could find from the last 500 match games: 311088-311587

White has a mate in one on move 32, but misses it and gets check-mated a couple moves later: http://www.lczero.org/match_game/311146

White doesn't protect against black's mate in one on move 46 (according to Lichess, white had a small advantage), but then black misses the opportunity and ends up with a draw: http://www.lczero.org/match_game/311121

White gives black a mate in two opportunity on both move 94 and 96, but black misses it both times: http://www.lczero.org/match_game/311554

Black had a forced mate in 5, but ends up with a draw: http://www.lczero.org/match_game/311240 https://lichess.org/analysis/8/8/pRpB4/2P2k2/P3p2K/4P3/5n2/6r1%20b%20-%20-%204%2047#93

White missed a simple tactic to capture black's queen on move 42, would have won but goes on to lose: http://www.lczero.org/match_game/311127 https://lichess.org/analysis/6k1/p7/7p/1p1pN3/8/8/1qB2R1K/8%20w%20-%20-%200%2042#82

Black misses trading its rook for a queen on move 38: http://www.lczero.org/match_game/311379

chara1ampos commented 6 years ago

Id: Leela gets checkmated! Game: https://lichess.org/KAxNVVzc#56 Bad move: 29. Qa3, (Stockfish eval goes from -0.2 to #-8) Correct move: f3 Configuration: lc0 cudnn - May 26 (default parameters), Windows 10 x64, Nvidia Titan V, Intel i5-7400T quad core, 32 GB RAM Network ID: 346 Time control: 60" + 2" Comments: Leela was ahead but blundered and was checkmated

steve3140 commented 6 years ago

I just re-tested NN350 against MultiMove #1 position above.

lczero.exe -w weights.txt position fen 6k1/4bpp1/2q1p2p/2p1P3/2P1N2P/2Bn1QP1/5P1K/8 b - - 0 35 go movetime 1000

info string Qd7 -> 15 (V: 33.78%) (N: 39.14%) PV: Qd7 Nd6 Nxe5 Bxe5 Bxd6 Qd3 Qc6 Qxd6 info string Nb4 -> 68 (V: 47.26%) (N: 31.72%) PV: Nb4 Kg2 Qa6 Nd2 Nc6 Qe4 Bf8 info string stm Black winrate 43.84%

That looks okay at that point, but NN350 still recommends the losing 35...Nb4 and doesn't notice the problems with it until 58kN.

She recommends the saving 35...Qd7 only after 129kN (NN316 found it in 19kN, NN311 in 29kN).

So tactically, at least in this case, things seem to have gone backwards again? Bit disappointing.

ASilver commented 6 years ago

Ok, this was frokm CLOP test, and really bad. In position below, Bh3+ wins queen on the spot. Even on my quad with GTX1060, LCZero v0.10 (default settings) and id358 take 3m30s to see this!

358-bh3

Here is full PGN. Key move is move 28.

[Event "?"] [Site "?"] [Date "2018.05.28"] [Round "1"] [White "lc0-may22"] [Black "ice3"] [Result "0-1"] [ECO "A05"] [PlyCount "65"] [EventDate "2018.??.??"] [TimeControl "60+1"]

  1. Nf3 {book} Nf6 {book} 2. g3 {book} g6 {book} 3. Bg2 {book} Bg7 {book} 4. O-O {book} O-O {book} 5. d3 {book} c5 {book} 6. c4 {0.02/2 3.3s} Nc6 {-0.09/19 4.6s } 7. Nc3 {0.08/2 2.3s} d6 {0.00/99 0.015s} 8. Rb1 {0.08/2 4.1s} a6 {-0.20/18 2. 1s} 9. a3 {0.09/2 2.5s} Rb8 {-0.11/20 2.1s} 10. b4 {0.08/2 1.8s} cxb4 {-0.12/ 20 2.1s} 11. axb4 {0.06/1 1.2s} b5 {-0.14/20 2.0s} 12. cxb5 {0.08/2 1.0s} axb5 {-0.14/18 2.0s} 13. Be3 {0.12/2 1.8s} Bd7 {-0.14/16 1.9s} 14. d4 {0.26/1 2.7s} Rc8 {0.09/18 2.0s} 15. Qd2 {0.17/2 3.1s} Bf5 {-0.09/17 1.9s} 16. Rb3 {0.09/2 2. 0s} d5 {0.14/17 1.6s} 17. Rc1 {0.19/2 3.0s} Ne4 {0.02/20 1.9s} 18. Qd1 { 0.26/1 2.1s} Nxb4 {0.13/20 1.6s} 19. Nxe4 {0.26/2 1.6s} dxe4 {0.12/22 1.9s} 20. Rxc8 {0.22/1 1.3s} Bxc8 {0.03/22 1.9s} 21. Rxb4 {0.33/2 2.0s} exf3 {-0.06/22 1. 6s} 22. Bxf3 {0.33/1 1.2s} Bd7 {-0.05/21 2.0s} 23. Kg2 {0.36/2 2.7s} Qb6 { 0.47/18 1.8s} 24. Qd3 {0.39/1 3.1s} Rc8 {0.43/19 1.9s} 25. Rb3 {0.38/2 3.0s} h5 {0.67/18 1.9s} 26. h4 {0.29/1 3.0s} Qd6 {0.43/18 1.7s} 27. Qe4 {0.38/1 1.9s} Qc7 {0.04/19 3.1s} 28. Qxe7 {1.54/1 1.7s} Bh3+ {9.96/23 1.7s} 29. Kxh3 { -10.24/2 1.9s} Qxe7 {9.92/23 1.7s} 30. Kg2 {-9.08/2 1.8s} b4 {10.57/22 1.7s}
  2. Bd2 {-8.79/2 1.8s} Bxd4 {10.99/22 1.6s} 32. Rxb4 {-7.64/2 2.6s} Bxf2 { 11.62/24 2.9s} 33. Kxf2 {adjudication -9.21/2 1.9s, Black wins by adjudication} 0-1
so-much-meta commented 6 years ago

@ASilver... Very similar to one of the match game blunders I referenced above (although this was a little bit older)... """ White missed a simple tactic to capture black's queen on move 42, would have won but goes on to lose: http://www.lczero.org/match_game/311127 """

All of these types of issues seem to be due to Leela having very small priors when it appears that a strong piece is left under attack and undefended. In my opinion, simple 1 and 2 move tactics should set the bar for how flat the training target policies are. If training hasn't found that for many situations it might be good to put the opponent's king in check in order to capture a queen, then IMO, that's a clear signal to flatten out the policies.... For example, making the PUCT/FPU changes you offered should help. Also, it's worth noting that Chess has a much more jagged landscape than Go (and Leela-Zero seems to be the basis for a lot of thoughts regarding PUCT/FPU in Leela-chess) -- chess is full of terrain with very sharp ups and downs. It's very hard for any value head to smooth that out completely. IMO, the policy entropy needs to be increased just to give the value head a chance.

Anyway, I'm hoping that eventually the devs will agree that the train target policy entropy is too small, as well as the entropy fo the policy head (which I plan to do analysis on), and that corrections will be made to flatten this out and make it less sharp.

Ideally (if this doesn't already exist) a system would be made such that training game generation allows for metaparameters to be pushed/pulled from the server. This would allow for small changes that can easily be reverted if necessary.

tooweaktooslow commented 6 years ago

ID367 doesn't see mate in two (I left her running for a bit, and she finally saw it at over 1 million nodes, which is a ""bit"" too many for a mate in two.) Default lc0 settings.

It takes ID 367 over 30k nodes to see that giving a knight for no reason isn't good. (all the while giving herself +5 eval) Default lc0 settings.

3r4/1p3pkp/1qp5/6p1/Pp2Pp1P/1P3Pn1/2Q3PK/2BR4 b - - 0 29 ID 367 doesn't see the Qf2 tactic, even with 8 million nodes. Default lc0 settings.

r3k2r/pR2pp1p/6p1/8/b2bP3/8/q2BBPPP/3Q1RK1 w kq - 2 16 ID 367 doesn't see the Rxe7 tactic, even with 2 million nodes. Default lc0 settings.

mooskagh commented 6 years ago

Please use forms from the first message of the issue. Just posting screenshots with comments "it doesn't see anything" is not that useful.

Also, if you do some preparation work (like importing into lichess, pointing what's the correct move and what wrong move is done instead), you'll save time to (hopefully) multiple people who will look into that, and they won't have to spend time on that multiple times.

tooweaktooslow commented 6 years ago

I provided FENs for the two tactical positions, so a lichess link isn't necessary. Correct moves can be seen in the screenshot, which should be looked at as it's the whole point of the post. Configuration and id are in the post.

edit: Sorry if I seem annoyed, but everything useful is in the screenshot. The position, sf8, sf9, leela evals, the continuation, nodecount and id (although it's in the post as well). It's much more intuitive for people that are familiar with chess GUIs to look at a screenshot with all the information, than to read through walls of text.

dubslow commented 6 years ago

Having the screenshot for extra details is fine, but for people trying to collate the entire contents of the thread, having one standard format makes that job much easier than trying to parse the unique format of your post, for N different "you" people making posts.

tooweaktooslow commented 6 years ago

image

Game: https://lichess.org/study/yyIYpmle (game 4)

Bad move: 81. b7??

Correct move: any bishop move that stays on the b1-h7 diagonal

Configuration: default win-lc0 on a gtx1070

Network ID: 367

Time control: 40/40, taken 68s for the move (> 500k nodes)

Comments: She realized that she blundered immediately (eval -40 next move), so she either overlooked Rxe4 completely (very unlikely given she used 500k nodes), or she thought she could promote one of the pawns after b7. However, the rook does a very good job of threatening mate and indirectly covering the promotions (if h8=Q, Ra4#; if b8=Q, Ra4+ followed by Rb4+ taking the queen. That's altogether a 3 move tactic which she definitely should've seen in 500k nodes. I tested her with PUCT=3.0 and she still didn't see Rxe4. image