Closed mooskagh closed 4 years ago
Here's a game from the original thread that still applies today, as far as I understand it.
ID: QueenTrap Game: https://lichess.org/efi0R82j#40 Bad move: 21. Qxg7 (g3g7) - SF9 eval on Lichess goes from +0.3 to -5.7 - Leela just lost a queen for a rook and 2 pawns. Refutation: 21. .. Rg6 (e6g6) - SF9 eval -5.7; second best move is e4 (e5e4) eval +0.9. Correct move: Re3 and h4 are both good moves leaving the position at +0.3. Configuration: Original game: lczero v0.9 ID 271 Currently tested against lc0.exe lc0-win-20180711-cuda92-cudnn714 with test ID 10067 and ID 485
Comments: Leela has a few tactics that are harder than others, and a few technical aspects of positions can make UCT need more nodes to find the right move in certain cases (lots of potential moves, "refutation" is an "only move" where the eval changes greatly but only if you find the right refutation, etc.). This position combines a few of these and makes it very nasty for Leela to avoid the blunder.
Specifically, there are two potential ways to try to get the queen out after Rg6 that Leela at first thinks can work. There's 22. Qxh7, but that falls to 22. .. Rxg2+ and a discovered attack on the queen - a Leela weakness. There's also 22. Rxe5, which fails to 22. .. Rxg7 and white cannot take the black queen on f5 because that undefends black's mate threat (Rxe1#) -- setting up a recapture that doesn't work because of undefending a mate threat is another Leela tactical weakness.
ID485: Given the position after Qxg7, it has a very hard time finding the refutation. At go nodes 100000, it stops here:
info depth 4 seldepth 22 time 25573 nodes 54758 score cp -138 hashfull 248 nps 2141 pv e5e4 g7d7 h7h5 h2h4 b7b6 d3e4 d5e4 d7d4 f5g4 g2g3 b8b7 e2e3 g4f5 e1e2
info string e8h8 (103 ) N: 8 (+ 0) (P: 0.63%) (Q: -0.90013) (U: 0.55864) (Q+U: -0.34149) (V: -0.8641)
info string f5d3 (833 ) N: 8 (+ 0) (P: 0.71%) (Q: -0.91401) (U: 0.62326) (Q+U: -0.29075) (V: -0.9066)
info string f5f2 (839 ) N: 8 (+ 0) (P: 0.72%) (Q: -0.92058) (U: 0.63945) (Q+U: -0.28113) (V: -0.9237)
info string f5e4 (829 ) N: 10 (+ 0) (P: 0.83%) (Q: -0.91912) (U: 0.60155) (Q+U: -0.31757) (V: -0.9314)
info string e8g8 (102 ) N: 10 (+ 0) (P: 0.77%) (Q: -0.85824) (U: 0.55375) (Q+U: -0.30449) (V: -0.8424)
info string f5f7 (813 ) N: 10 (+ 0) (P: 0.87%) (Q: -0.93293) (U: 0.63007) (Q+U: -0.30286) (V: -0.9505)
info string e6h6 (548 ) N: 10 (+ 0) (P: 0.74%) (Q: -0.83100) (U: 0.53577) (Q+U: -0.29523) (V: -0.8802)
info string f5h3 (837 ) N: 10 (+ 0) (P: 0.84%) (Q: -0.88443) (U: 0.60417) (Q+U: -0.28025) (V: -0.9197)
info string f5f3 (835 ) N: 10 (+ 0) (P: 0.84%) (Q: -0.88704) (U: 0.60731) (Q+U: -0.27973) (V: -0.9309)
info string f5g4 (831 ) N: 11 (+ 0) (P: 0.93%) (Q: -0.93908) (U: 0.61512) (Q+U: -0.32396) (V: -0.9242)
info string f5g5 (826 ) N: 12 (+ 0) (P: 1.06%) (Q: -0.93925) (U: 0.64834) (Q+U: -0.29091) (V: -0.9350)
info string f5f8 (810 ) N: 18 (+ 0) (P: 0.83%) (Q: -0.64462) (U: 0.34707) (Q+U: -0.29754) (V: -0.3538)
info string e6d6 (545 ) N: 32 (+ 0) (P: 1.16%) (Q: -0.56372) (U: 0.28015) (Q+U: -0.28357) (V: -0.5490)
info string e8d8 (100 ) N: 34 (+ 0) (P: 1.34%) (Q: -0.59195) (U: 0.30425) (Q+U: -0.28770) (V: -0.5328)
info string e8c8 (99 ) N: 34 (+ 0) (P: 1.26%) (Q: -0.57447) (U: 0.28691) (Q+U: -0.28756) (V: -0.4978)
info string b8a8 (23 ) N: 37 (+ 0) (P: 0.97%) (Q: -0.48835) (U: 0.20258) (Q+U: -0.28577) (V: -0.3970)
info string f5f6 (818 ) N: 41 (+ 0) (P: 1.69%) (Q: -0.60927) (U: 0.32103) (Q+U: -0.28824) (V: -0.3617)
info string f5g6 (819 ) N: 44 (+ 0) (P: 1.61%) (Q: -0.56770) (U: 0.28439) (Q+U: -0.28331) (V: -0.2992)
info string b8a7 (30 ) N: 52 (+ 0) (P: 1.49%) (Q: -0.51040) (U: 0.22410) (Q+U: -0.28629) (V: -0.3719)
info string e6b6 (543 ) N: 66 (+ 0) (P: 1.80%) (Q: -0.50015) (U: 0.21384) (Q+U: -0.28631) (V: -0.5593)
info string e6c6 (544 ) N: 70 (+ 0) (P: 1.16%) (Q: -0.41705) (U: 0.13041) (Q+U: -0.28664) (V: -0.5058)
info string e6f6 (546 ) N: 76 (+ 0) (P: 1.11%) (Q: -0.39958) (U: 0.11489) (Q+U: -0.28468) (V: -0.3731)
info string e8f8 (101 ) N: 102 (+ 0) (P: 1.01%) (Q: -0.35908) (U: 0.07777) (Q+U: -0.28131) (V: -0.3605)
info string b7b5 (234 ) N: 111 (+ 0) (P: 1.97%) (Q: -0.42411) (U: 0.13982) (Q+U: -0.28428) (V: -0.3466)
info string d5d4 (761 ) N: 124 (+ 0) (P: 2.04%) (Q: -0.41484) (U: 0.12968) (Q+U: -0.28517) (V: -0.3185)
info string c7c5 (264 ) N: 136 (+ 0) (P: 2.57%) (Q: -0.43478) (U: 0.14947) (Q+U: -0.28531) (V: -0.2763)
info string e6g6 (547 ) N: 192 (+ 0) (P: 6.15%) (Q: -0.53725) (U: 0.25345) (Q+U: -0.28380) (V: -0.3435)
info string c7c6 (259 ) N: 193 (+ 0) (P: 2.87%) (Q: -0.40256) (U: 0.11765) (Q+U: -0.28490) (V: -0.3068)
info string a6a5 (425 ) N: 212 (+ 0) (P: 2.78%) (Q: -0.38876) (U: 0.10374) (Q+U: -0.28503) (V: -0.3065)
info string f5f4 (830 ) N: 214 (+ 0) (P: 1.69%) (Q: -0.34816) (U: 0.06250) (Q+U: -0.28567) (V: -0.3993)
info string b7b6 (230 ) N: 223 (+ 0) (P: 2.96%) (Q: -0.39055) (U: 0.10523) (Q+U: -0.28532) (V: -0.3035)
info string b8c8 (24 ) N: 367 (+ 0) (P: 4.36%) (Q: -0.37933) (U: 0.09426) (Q+U: -0.28507) (V: -0.2628)
info string h7h6 (400 ) N: 519 (+ 0) (P: 6.08%) (Q: -0.37837) (U: 0.09306) (Q+U: -0.28530) (V: -0.3446)
info string h7h5 (403 ) N: 557 (+ 0) (P: 10.51%) (Q: -0.43458) (U: 0.14987) (Q+U: -0.28472) (V: -0.2830)
info string f5h5 (827 ) N: 772 (+ 0) (P: 4.90%) (Q: -0.33454) (U: 0.05041) (Q+U: -0.28412) (V: -0.2358)
info string e8e7 (106 ) N: 1129 (+ 0) (P: 5.14%) (Q: -0.32378) (U: 0.03618) (Q+U: -0.28760) (V: -0.2906)
info string e6e7 (539 ) N: 1698 (+ 0) (P: 9.42%) (Q: -0.33019) (U: 0.04412) (Q+U: -0.28608) (V: -0.3310)
info string e5e4 (796 ) N: 47597 (+238) (P: 12.19%) (Q: -0.28812) (U: 0.00203) (Q+U: -0.28609) (V: -0.2730)
bestmove e5e4
It still has e6g6 buried pretty deep, and thus it's never avoiding this trap.
ID 10067: Slightly better. At ~96000 nodes, it sees that e6g6 is winning:
info depth 4 seldepth 22 time 25573 nodes 54758 score cp -138 hashfull 248 nps 2141 pv e5e4 g7d7 h7h5 h2h4 b7b6 d3e4 d5e4 d7d4 f5g4 g2g3 b8b7 e2e3 g4f5 e1e2
info string e8h8 (103 ) N: 8 (+ 0) (P: 0.63%) (Q: -0.90013) (U: 0.55864) (Q+U: -0.34149) (V: -0.8641)
info string f5d3 (833 ) N: 8 (+ 0) (P: 0.71%) (Q: -0.91401) (U: 0.62326) (Q+U: -0.29075) (V: -0.9066)
info string f5f2 (839 ) N: 8 (+ 0) (P: 0.72%) (Q: -0.92058) (U: 0.63945) (Q+U: -0.28113) (V: -0.9237)
info string f5e4 (829 ) N: 10 (+ 0) (P: 0.83%) (Q: -0.91912) (U: 0.60155) (Q+U: -0.31757) (V: -0.9314)
info string e8g8 (102 ) N: 10 (+ 0) (P: 0.77%) (Q: -0.85824) (U: 0.55375) (Q+U: -0.30449) (V: -0.8424)
info string f5f7 (813 ) N: 10 (+ 0) (P: 0.87%) (Q: -0.93293) (U: 0.63007) (Q+U: -0.30286) (V: -0.9505)
info string e6h6 (548 ) N: 10 (+ 0) (P: 0.74%) (Q: -0.83100) (U: 0.53577) (Q+U: -0.29523) (V: -0.8802)
info string f5h3 (837 ) N: 10 (+ 0) (P: 0.84%) (Q: -0.88443) (U: 0.60417) (Q+U: -0.28025) (V: -0.9197)
info string f5f3 (835 ) N: 10 (+ 0) (P: 0.84%) (Q: -0.88704) (U: 0.60731) (Q+U: -0.27973) (V: -0.9309)
info string f5g4 (831 ) N: 11 (+ 0) (P: 0.93%) (Q: -0.93908) (U: 0.61512) (Q+U: -0.32396) (V: -0.9242)
info string f5g5 (826 ) N: 12 (+ 0) (P: 1.06%) (Q: -0.93925) (U: 0.64834) (Q+U: -0.29091) (V: -0.9350)
info string f5f8 (810 ) N: 18 (+ 0) (P: 0.83%) (Q: -0.64462) (U: 0.34707) (Q+U: -0.29754) (V: -0.3538)
info string e6d6 (545 ) N: 32 (+ 0) (P: 1.16%) (Q: -0.56372) (U: 0.28015) (Q+U: -0.28357) (V: -0.5490)
info string e8d8 (100 ) N: 34 (+ 0) (P: 1.34%) (Q: -0.59195) (U: 0.30425) (Q+U: -0.28770) (V: -0.5328)
info string e8c8 (99 ) N: 34 (+ 0) (P: 1.26%) (Q: -0.57447) (U: 0.28691) (Q+U: -0.28756) (V: -0.4978)
info string b8a8 (23 ) N: 37 (+ 0) (P: 0.97%) (Q: -0.48835) (U: 0.20258) (Q+U: -0.28577) (V: -0.3970)
info string f5f6 (818 ) N: 41 (+ 0) (P: 1.69%) (Q: -0.60927) (U: 0.32103) (Q+U: -0.28824) (V: -0.3617)
info string f5g6 (819 ) N: 44 (+ 0) (P: 1.61%) (Q: -0.56770) (U: 0.28439) (Q+U: -0.28331) (V: -0.2992)
info string b8a7 (30 ) N: 52 (+ 0) (P: 1.49%) (Q: -0.51040) (U: 0.22410) (Q+U: -0.28629) (V: -0.3719)
info string e6b6 (543 ) N: 66 (+ 0) (P: 1.80%) (Q: -0.50015) (U: 0.21384) (Q+U: -0.28631) (V: -0.5593)
info string e6c6 (544 ) N: 70 (+ 0) (P: 1.16%) (Q: -0.41705) (U: 0.13041) (Q+U: -0.28664) (V: -0.5058)
info string e6f6 (546 ) N: 76 (+ 0) (P: 1.11%) (Q: -0.39958) (U: 0.11489) (Q+U: -0.28468) (V: -0.3731)
info string e8f8 (101 ) N: 102 (+ 0) (P: 1.01%) (Q: -0.35908) (U: 0.07777) (Q+U: -0.28131) (V: -0.3605)
info string b7b5 (234 ) N: 111 (+ 0) (P: 1.97%) (Q: -0.42411) (U: 0.13982) (Q+U: -0.28428) (V: -0.3466)
info string d5d4 (761 ) N: 124 (+ 0) (P: 2.04%) (Q: -0.41484) (U: 0.12968) (Q+U: -0.28517) (V: -0.3185)
info string c7c5 (264 ) N: 136 (+ 0) (P: 2.57%) (Q: -0.43478) (U: 0.14947) (Q+U: -0.28531) (V: -0.2763)
info string e6g6 (547 ) N: 192 (+ 0) (P: 6.15%) (Q: -0.53725) (U: 0.25345) (Q+U: -0.28380) (V: -0.3435)
info string c7c6 (259 ) N: 193 (+ 0) (P: 2.87%) (Q: -0.40256) (U: 0.11765) (Q+U: -0.28490) (V: -0.3068)
info string a6a5 (425 ) N: 212 (+ 0) (P: 2.78%) (Q: -0.38876) (U: 0.10374) (Q+U: -0.28503) (V: -0.3065)
info string f5f4 (830 ) N: 214 (+ 0) (P: 1.69%) (Q: -0.34816) (U: 0.06250) (Q+U: -0.28567) (V: -0.3993)
info string b7b6 (230 ) N: 223 (+ 0) (P: 2.96%) (Q: -0.39055) (U: 0.10523) (Q+U: -0.28532) (V: -0.3035)
info string b8c8 (24 ) N: 367 (+ 0) (P: 4.36%) (Q: -0.37933) (U: 0.09426) (Q+U: -0.28507) (V: -0.2628)
info string h7h6 (400 ) N: 519 (+ 0) (P: 6.08%) (Q: -0.37837) (U: 0.09306) (Q+U: -0.28530) (V: -0.3446)
info string h7h5 (403 ) N: 557 (+ 0) (P: 10.51%) (Q: -0.43458) (U: 0.14987) (Q+U: -0.28472) (V: -0.2830)
info string f5h5 (827 ) N: 772 (+ 0) (P: 4.90%) (Q: -0.33454) (U: 0.05041) (Q+U: -0.28412) (V: -0.2358)
info string e8e7 (106 ) N: 1129 (+ 0) (P: 5.14%) (Q: -0.32378) (U: 0.03618) (Q+U: -0.28760) (V: -0.2906)
info string e6e7 (539 ) N: 1698 (+ 0) (P: 9.42%) (Q: -0.33019) (U: 0.04412) (Q+U: -0.28608) (V: -0.3310)
info string e5e4 (796 ) N: 47597 (+238) (P: 12.19%) (Q: -0.28812) (U: 0.00203) (Q+U: -0.28609) (V: -0.2730)
bestmove e5e4
It still takes 192K nodes from the blunder position to avoid g3g7
info depth 4 seldepth 24 time 133477 nodes 182710 score cp 33 hashfull 726 nps 1368 pv g3g7 e5e4 d3e4 d5e4 e2e3 e6g6 g7d4 g6d6 d4b4 d6e6 b4c4 h7h5 c4e2 h5h4 h2h3 f5f4
info depth 4 seldepth 24 time 138569 nodes 191202 score cp 33 hashfull 752 nps 1379 pv g3g7 e5e4 d3e4 d5e4 e2e3 e6g6 g7d4 g6d6 d4b4 d6e6 b4c4 h7h5 c4e2 h5h4 h2h3 f5f4
info depth 4 seldepth 24 time 139703 nodes 192860 score cp 34 hashfull 758 nps 1380 pv h2h3 g7g5 f2f3 e6e7 e2e3 f5f6 c2c3 h7h5 d3d4 e5e4 f3e4 d5e4 e1f1 f6g6
I find it interesting that the test nets have similar tactical weaknesses to the original net - which suggests they are just "hard" for the NN to learn? Perhaps starting at a larger net will help with that, perhaps certain positions just require a lot of nodes to overcome tactical weaknesses. I don't think there's a "bug" here to fix, but it's an instructive position.
I'm hoping that upping the training cpuct from 1.2 will help prevent this. These searches used the default lc0 search parameters, right?
I read that it can be reproducible on all networks, including main nets which are trained with "high" cpuct, so there is no evidence that different cpuct would help. I don't object changing cpuct for training runs, but don't expect from it to change things much.
including main nets which are trained with "high" cpuct
there are no such nets which are as strong as recent main or test nets
This was all done with default search parameters. And yes, it's been a problem for hundreds of networks, including several of the test nets (I haven't tested them all). There are a few simpler positions from the original thread that showcase individual tactical weaknesses. They have shown improvement since about ID 450? Instead of the right move, or the refutation move, being policy 0.2% or worse, they're up over 1% and thus UCT can find them in 1-2k nodes, which is reasonable. But with a complex situation like this one, it has to overcome a bad policy down several lines, and the difference in eval is high so it takes a lot of nodes to move the eval enough even after hitting on the right moves. I'm happy to rerun some results using a non-default cpuct if it's thought to be useful information, but I think overall the position is exposing some positions that leela's current structure is tactically weak at and UCT is weaker at finding. Maybe different training parameters would result in a net that better handles these tactics, maybe a larger network size could encapsulate the tactics required better, or maybe it's just tactics that don't get learned until further in the learning process.
Got a blunder in a test game today. Time control was 30 moves in 60 minutes: lc0_0712_ispc.exe vs Stockfish 8. (on rather slow and old pc). Main network 483. --cpuct=1.2 --fpu-reduction=.2 --policy-softmax-temp=1 for the search to have the same parameters with lczero and --max-prefetch=0 --minibatch-size=4 for the blas backend
https://lichess.org/MoD5vHSJ%22%5D
Test10 ID 10104 in the following game it played 48...h5?? missing the tactic Rf8+ Kh7 Qc2+ Be4 Qxe4 Rxe4(removing the pin, this is one of the 2 the classic themes of Leela's tactical blunders) gxh3.
[Event "Tour12 10104"]
[Site "Terminator PC"]
[Date "2018.07.21"]
[Round "20"]
[White "Wasp 3.0"]
[Black "Lc0 Test10 10104"]
[Result "1-0"]
[ECO "C05"]
[WhiteElo "2200"]
[BlackElo "2200"]
[WhiteType "program"]
[BlackType "program"]
[Opening "French"]
[Variation "Tarrasch, Closed, Nunn-Korchnoi Gambit, 4.e5 Nfd7 5.Bd3 c5 6.c3 Nc6 7.Ngf3 Qb6 8.O-O"]
[Time "11:03:20"]
[TimeControl "40/120:40/120:40/120"]
[Termination "normal"]
[PlyCount "108"]
1. e4 e6 2. d4 d5 3. Nd2 Nf6 4. e5 Nfd7 5. c3 c5 6. Bd3 Nc6 7. Ngf3 {+0.07/16 3}
7... Qb6 {+0.31/2 5} 8. O-O {-0.10/18 3} 8... cxd4 {+0.22/2 3} 9. cxd4 {0.00/20
3} 9... Nxd4 {+0.16/2 4} 10. Nxd4 {0.00/20 3} 10... Qxd4 {+0.15/2 1} 11. Nf3
{0.00/19 3} 11... Qb6 {+0.13/2 1} 12. Qa4 {0.00/18 3} 12... Qb4 {-0.02/2 5} 13.
Qc2 {-0.03/18 3} 13... Qc5 {-0.06/2 7} 14. Qb1 {+0.02/18 4} 14... Qc7 {+0.10/2
4} 15. Bf4 {+0.12/17 3} 15... h6 {+0.15/2 2} 16. Rc1 {-0.14/16 3} 16... Qd8
{+0.16/2 2} 17. Be3 {-0.02/15 3} 17... Be7 {+0.29/2 3} 18. Qc2 {-0.22/15 3}
18... O-O {+0.30/2 2} 19. Qa4 {0.00/17 3} 19... f6 {+0.47/2 3} 20. exf6
{-0.54/16 3} 20... Nxf6 {+0.40/2 3} 21. Ne5 {0.00/16 3} 21... Bd6 {+0.39/2 2}
22. f4 {0.00/16 3} 22... Bd7 {+0.93/2 3} 23. Qd1 {-0.33/17 3} 23... Be8 {+1.12/2
3} 24. Qe1 {-0.44/17 3} 24... Bxe5 {+1.44/2 3} 25. fxe5 {-0.43/17 1} 25... Ne4
{+1.52/2 2} 26. Qb4 {-0.34/18 3} 26... Bc6 {+1.93/2 3} 27. Rf1 {-0.42/18 3}
27... Qh4 {+1.89/2 5} 28. Rxf8+ {-0.24/19 4} 28... Rxf8 {+1.63/2 3} 29. Bxa7
{-0.25/18 3} 29... Ra8 {+1.45/2 6} 30. Be3 {0.00/17 3} 30... Ra4 {+1.23/2 3} 31.
Qb3 {+0.14/18 3} 31... Qh5 {+1.34/2 3} 32. Rc1 {0.00/18 3} 32... Ra5 {+1.16/2 1}
33. Bxe4 {0.00/18 3} 33... Rb5 {+1.01/2 2} 34. Bh7+ {+0.11/19 3} 34... Kxh7
{+0.89/2 1} 35. Qc3 {+0.14/19 3} 35... d4 {+0.83/2 3} 36. Bxd4 {+0.52/18 3}
36... Kg8 {+0.72/2 5} 37. Qc2 {+0.58/17 3} 37... Ra5 {+1.22/2 3} 38. a3
{+0.59/19 3} 38... Ra4 {+1.00/2 3} 39. Rd1 {+0.69/18 3} 39... Rc4 {+0.84/2 2}
40. Bc3 {+0.37/17 3} 40... Rg4 {+0.79/2 0} 41. Rd8+ {+0.53/17 2} 41... Kf7
{+0.73/2 2} 42. Rd2 {+0.53/17 2} 42... Kg8 {+0.76/2 7} 43. Rf2 {+0.60/17 2}
43... Be4 {+0.63/2 7} 44. Qe2 {+0.72/18 2} 44... Bc6 {+0.58/2 7} 45. Qd3
{+0.76/18 2} 45... Be4 {+0.61/2 4} 46. Qd2 {+0.68/18 3} 46... Qh3 {+0.68/2 4}
47. Qe2 {+0.84/17 2} 47... Bd5 {+0.67/2 5} 48. Bb4 {+0.81/19 2} 48... h5
{+0.76/2 5} 49. Rf8+ {+4.27/22 2} 49... Kh7 {+1.11/2 0} 50. Qc2+ {+4.40/23 2}
50... Be4 {-12.35/2 5} 51. Qxe4+ {+4.40/21 1} 51... Rxe4 {-16.21/2 3} 52. gxh3
{+4.51/21 1} 52... Rxe5 {-18.24/3 4} 53. Bc3 {+4.60/21 2} 53... Rg5+ {-19.99/2
5} 54. Kh1 {+4.69/22 2} 54... b5 {-18.33/2 5 Black resigns} 1-0
Hardware for Leela was GTX 1070 Ti, Lc0(1 July) was used, test10 ID 10104 and time control 40/2 repeating.
Leela was just unlucky as the following analysis of the position(of the FEN, not PGN, but with PGN analysis it also avoid h5 in exactly 8 seconds too) shows.
In the game it played after 5 seconds while to avoid h5 it needs 8 seconds on this hardware. Yet, that EXTREMELY simple tactic for 2018 engines should be solvable in less than 0.1 second.
Lc0 Test10 10104:
1/10 00:00 2,156 2,949 -1,01 h6-h5 Bb4-c3 Rg4-g6 Qe2-c2 Rg6-g4 Qc2-e2 Rg4-g6
1/11 00:01 5,138 3,450 -0,90 h6-h5 Bb4-d2 Bd5-c6 Bd2-c3 Bc6-d5 Qe2-d2 b7-b5 Qd2-e2 Rg4-g6
1/12 00:01 6,288 3,554 -0,84 h6-h5 Bb4-d2 b7-b5 Bd2-b4 Rg4-g6 Qe2-c2 Kg8-h7 Rf2-e2 Qh3-g4
1/13 00:03 11,873 3,731 -0,83 h6-h5 Bb4-d2 Bd5-c6 Bd2-c3 Bc6-d5 Qe2-d2 b7-b5 Qd2-e2 Rg4-g6 Qe2xb5
1/17 00:05 22,468 3,820 -0,65 h6-h5 Bb4-d2 Bd5-c6 Bd2-c3 Bc6-d5 Qe2-d2 b7-b5 Qd2-e2 Rg4-g6 Qe2xb5
2/17 00:08 35,046 3,981 -0,65 h6-h5 Bb4-d2 Bd5-c6 Bd2-c3 Bc6-d5 Qe2-d2 b7-b5 Qd2-e2 Rg4-g6 Qe2xb5
2/17 00:09 38,145 4,008 -0,72 Kg8-h8 Bb4-d2 Kh8-g8 Bd2-b4 Rg4-g6 Qe2-c2 Rg6-g4 Rf2-e2 Bd5-e4
2/18 00:10 43,686 4,125 -0,69 Kg8-h8 Bb4-c3 Kh8-h7 Qe2-c2+ Kh7-g8 Qc2-e2 Rg4-g5 a3-a4 b7-b6 Bc3-d2
2/19 00:13 55,553 4,131 -0,49 Kg8-h8 Bb4-c3 Kh8-h7 Qe2-c2+ Kh7-g8 Qc2-e2 Rg4-g5 a3-a4 b7-b6 Bc3-d2
Net-520-20180727 https://lichess.org/ol6H1KUy#52 Leela didn't found Bf2 (Stockfish at depth 33) go nodes 10 -> d1d2 (Stockfish ~ -5) go nodes 10000 -> g1h2 (Stockfish ~ -1) g1f2 is already in the list of possible moves and calculated with a similar q value than g1h2.
Blunder? Assumption: Once more learning is done lc0 will learn this better.
ID TCEC-13.23.2.1 Network ID 10161 Lc0 16 Nodes: 539702 Move time: 41s
During TCEC. Leela versus Senpai.
It's not a blunder but a spike of evaluation from 0.81 to 4.71 then back to 0.33.
The Eval spiked at move 27. Qf5.
https://pasteboard.co/HydTAQf.png
You can try to reproduce with:
./lc0 --nncache=2000000 --verbose-move-stats
position fen 4rrk1/pp3pp1/3R1n1p/2q1nB2/5B2/2P2P2/P1Q2PKP/3R4 b - - 10 22 moves f6h5 f4g3 h5f6 d1d4 b7b5 f5d3 a7a5 d3e2 b5b4 go nodes 540000
ID TCEC-13.23.2.2 Network ID 10161 Lc0 16 Rxc3 {d=6, sd=28, mt=24878, tl=144133, s=59684, n=1484841, pv=Rxc3 Kxh6 Kg3 Kg5 Rc5 Kf6 Kf4 Kg6 Rc8 Kg7 Rc6 Kf7 Rh6 Kg7 Rh5 Kg6 Rh4 Kf6 Rh8 Kg6 Rf8 h2 Rh8 Kf6 Rxh2 Ke6, tb=0, h=100.0, ph=0.0, wv=6.34, R50=50, Rd=-11, Rr=-9, mb=-3+0+1+1+0,}
Bad Move: 79. Rxc3 Correct move: anything that does not exchange something. Moving the bishop somewhere out of reach of both K and N.
Event "TCEC Season 13 - Division 4"] [Site "http://tcec.chessdom.com"] [Date "2018.08.08"] [Round "23.2"] [White "LCZero 16.10161"] [Black "Senpai 2.0"] [Result "1/2-1/2"] [BlackElo "3062"] [ECO "E10"] [GameDuration "01:22:41"] [GameEndTime "2018-08-08T14:23:19.344 W. Europe Standard Time"] [GameStartTime "2018-08-08T13:00:37.643 W. Europe Standard Time"] [Opening "Queen's pawn game"] [PlyCount "158"] [Termination "adjudication"] [TerminationDetails "SyzygyTB"] [TimeControl "1800+10"] [WhiteElo "3219"]
(Tablebase draw)
ID: TCEC - Season 13 - Divison 3 - Game 19.1
Black: LC0 0.16.1 (TCEC version) NET 10520
Move 29 ..Bg4 is wrong
NET 10776 with release 16.0 is calculating Bc8 after 3.000.000 moves. Why did version 16.1 - Net 10520 not found this solution?
In the final position of the following PGN Leela as black has just lost a Knight. And the position is dead lost for black. So she should give a big positive score(test10 nets do that, for example 11089 gives +5.00 scores). Yet, she doesn't seem to care and gives d5 with only a tiny plus score(+0.60) for white like it's ok.
Leela = Lc0v17 cuda default settings, 20230 net, with GTX 1070 Ti in infinite analysis mode.
Lc0v17 20230:
5/13 00:01 5,730 3,804 +0,29 d6-d5 Qe4-e5 0-0 Nb1-c3 c7-c6 Qe5-g3 Be7-d6 Qg3-h4 Qd8xh4
6/13 00:02 9,023 4,028 +0,30 d6-d5 Qe4-f4 0-0 Nb1-c3 Nb8-c6 d2-d4 Nc6-b4 Bf1-d3 Nb4xd3+
6/14 00:02 11,712 4,132 +0,32 d6-d5 Qe4-e5 0-0 Nb1-c3 Nb8-c6 Qe5xd5 Qd8-e8 Bf1-c4 Nc6-b4
..............
..............
12/28 01:10 333,978 4,750 +0,58 d6-d5 Qe4-e5 Nb8-c6 Qe5xg7 Be7-f6 Qg7-h6 Qd8-e7+
12/29 01:30 425,028 4,721 +0,61 d6-d5 Qe4-e5 Nb8-c6 Qe5xg7 Be7-f6 Qg7-h6 Qd8-e7+
[Event "?"] [Site "?"] [Date "????.??.??"] [Round "?"] [White "?"] [Black "?"] [Result "*"] [ECO "C42"] [WhiteElo "2400"] [BlackElo "2000"] [PlyCount "11"] [TimeControl "900+5"]
Yet, perhaps this is not a real issue and just test20 has a very different mapping of the evaluation scores and this tiny looking +0.60 to correspond to +5.0 for test10 nets. Since before the capture she evaluated the position as +0.13 so after forcing 5...Be7?? that allows the Knight capture for free, she indeed feels her position got a lot worse. And perhaps this will improve with time and these tiny scores in a LOST position will get a lot larger. This will be an issue if it doesn't happen.
Furthermore 20230 net doesn't want to play even for a moment the 5...Be7?? move that gives then Knight. 3 nets before, the 20227 net (that fischerandom got this game by playing against it at 2000 nodes per move) wanted to play the abysmal Be7 move at 2000 nodes but after 20000 nodes it stabilizes to the correct 5...Qe7 move not giving the Knight! But 20227 net was just a recovering net of the "big spike" so i guess it's normal. So i guess it's not a real issue, only the very tiny eval in dead lost position is....
Some default settings are not good for high node counts. cPUCT in particular should be higher than default. Have you tried using the CCCC settings (but with table) and seeing if that fixes it?
About the game Leela-Fizbo at CCCC:
[Event "CCCC 1: Rapid Rumble (15|5) Stage 1"]
[Site "Chess.com"]
[Date "2018.09.09"]
[Round "?"]
[White "Lc0 17.11089"]
[Black "Fizbo 1.9"]
[Result "1-0"]
[ECO "D31"]
[WhiteElo "2400"]
[BlackElo "2400"]
[PlyCount "254"]
1. d4 d5 2. c4 e6 3. Nc3 c6 4. e4 dxe4 5. Nxe4 Bb4+ 6. Nc3 Nf6 7. a3 Bxc3+ 8.
bxc3 Nbd7 9. Nf3 O-O 10. a4 c5 11. a5 b6 12. Bd3 bxa5 13. O-O Bb7 14. Re1 Rc8
15. Ne5 cxd4 16. cxd4 Nxe5 17. Rxe5 Ba6 18. Ra4 Nd7 19. Bg5 Qb6 20. Re1 Qb7 21.
Be4 Qc7 22. c5 Rfe8 23. Ra3 Nf8 24. Qa4 Bb7 25. Bf4 Qxf4 26. Bxb7 Rcd8 27. Rd1
Re7 28. Bf3 Ng6 29. g3 Qc7 30. Rb3 Rc8 31. h4 Rd7 32. Rb7 Qxb7 33. Bxb7 Rxb7
34. Qxa5 Ne7 35. Qa6 Rcc7 36. Qa2 Nd5 37. Rb1 h5 38. Rxb7 Rxb7 39. Qa6 Rc7 40.
f3 Kh7 41. g4 hxg4 42. fxg4 g6 43. Kf2 Kg7 44. Qb5 Rc8 45. Qb7 Rc7 46. Qb8 Rd7
47. Qb5 Re7 48. Kg3 Rc7 49. Kf3 Kg8 50. Kf2 Kg7 51. Kg3 Rc8 52. Qb7 Rc7 53. Qb5
Rc8 54. Qb7 Rc7 55. Qb8 Rd7 56. Qc8 Nf6 57. g5 Ne4+ 58. Kh3 Rxd4 59. c6 Rd3+
60. Kg2 Nc3 61. c7 Rd2+ 62. Kh1 Rd1+ 63. Kh2 Rd2+ 64. Kg1 Rd1+ 65. Kh2 Rd2+ 66.
Kg3 Rd3+ 67. Kg2 Rd2+ 68. Kf3 Rd3+ 69. Kf2 Rd2+ 70. Ke1 Rd1+ 71. Kf2 Rd2+ 72.
Ke3 Rh2 73. Qf8+ Kxf8 74. c8=Q+ Kg7 75. Qxc3+ Kg8 76. Qd4 a5 77. Kf3 a4 78.
Qxa4 Rh1 79. Kg2 Rb1 80. Qd7 Rb2+ 81. Kf3 Rb3+ 82. Kf2 Rb2+ 83. Ke3 Rb3+ 84.
Ke2 Rb2+ 85. Kf1 Rb1+ 86. Ke2 Rh1 87. Qd4 e5 88. Qa4 Rh3 89. Kf2 Kg7 90. Qe4
Kg8 91. Kg2 Ra3 92. Qc6 Ra2+ 93. Kg3 e4 94. Qe8+ Kg7 95. Qe5+ Kg8 96. Qxe4 Ra3+
97. Kg4 Ra6 98. Qe8+ Kg7 99. Kf3 Re6 100. Qc8 Re7 101. Qc3+ Kg8 102. Qc6 Re6
103. Qa8+ Kg7 104. Kf2 Rd6 105. Qa1+ Kg8 106. Ke3 Re6+ 107. Kf4 Kh7 108. Qd4
Kg8 109. Kg4 Kh7 110. Qd8 Kg7 111. Qa8 Rb6 112. Qa1+ Kg8 113. Qa8+ Kg7 114.
Qa1+ Kg8 115. Qa5 Rd6 116. Qa8+ Kg7 117. Qa1+ Kg8 118. Qa8+ Kg7 119. Qa1+ Kg8
120. Kf4 Re6 121.Qb2 Kh7 122.Qb8 Kg7 123.Qc8 Rd6 124.Qc4+ K8 125.Qc8+ Kg7
126.Qc3+ Kg8 127.Qc7 Re6 128.Qc8+ Kg7 129.Qb8 Ra6 130.Qe5+ Kg8 131.Qb8+ Kg7
132.Qe5+ Kg8 133. Qc7 Re6 134. Qc8+ Kg7 135. Qa8 Re1 136. Qf3 Re6 137. Qb7
Kg8 138. Qc7 Kg7 139. Qd8 Ra6 140. Qd4+ Kg8
Leela = Lc0v17 cuda default settings, 11089 net, with GTX 1070 Ti, WITH 3,4,5,6 syzygy TBs, in infinite analysis mode. Leela with TBs would play 96.h5 and would win easily but: While Leela with TBs avoids 120.h5?? that draws that Leela on CCCC played, and prefers 120.Kf4, after following Leela's recommendations for both players: 120...Re6 121.Qb2 Kh7 122.Qb8 Kg7 123.Qc8 Rd6 124.Qc4+ K8 125.Qc8+ Kg7 126.Qc3+ Kg8 127.Qc7 Re6 128.Qc8+ Kg7 129.Qb8 Ra6 130.Qe5+ Kg8 131.Qb8+ Kg7 132.Qe5+ Kg8 now Leela wants to play 133.h5?? again. That just draws. This is with 3,4,5,6 TBs!!
It keeps 133.h5 up to 1.300.000 nodes with around 60.000 TB hits and then goes to 133.Qb5 and then to 133.Qc7 with around 230.000 TB hits.
Lc0v17 11089:
8/17 00:05 54,696 9,500 +7,49 h4-h5 g6xh5 Qe5-b8+ Kg8-g7 Qb8-b2+ Kg7-g8 Qb2-b8+ Kg8-g7 Qb8-e5+ Kg7-g8
8/17 00:10 109,596 10,175 +7,32 h4-h5 g6xh5 Qe5-b8+ Kg8-g7 Qb8-e5+ Kg7-g8 Qe5-b8+ Kg8-g7 Qb8-b2+
8/18 00:12 124,278 10,273 +7,38 h4-h5 g6xh5 Qe5-b8+ Kg8-g7 Qb8-b2+ Kg7-g8 Qb2-b5 Ra6-e6 Qb5-b8+
.............................................................
10/23 01:55 1,312,493 11,375 +6,86 h4-h5 g6xh5 Qe5-b8+ Kg8-g7 Qb8-e5+ Kg7-g8 Qe5-b8+ Kg8-g7 Qb8-b2+
10/23 01:59 1,349,621 11,322 +7,12 Qe5-c7 Ra6-e6 Qc7-c8+ Kg8-g7 Kf4-f3 Re6-d6 Qc8-c3+ Kg7-g8 h4-h5
10/23 02:04 1,406,903 11,327 +7,07 Qe5-c7 Ra6-e6 Qc7-c8+ Kg8-g7 Kf4-f3 Re6-d6 Qc8-c3+ Kg7-g8 h4-h5
10/24 02:05 1,416,822 11,323 +7,07 Qe5-c7 Ra6-e6 Qc7-c8+ Kg8-g7 Kf4-f3 Re6-d6 Qc8-c3+ Kg7-g8 h4-h5
.............................................................
12/28 04:14 2,842,807 11,149 +6,93 Qe5-c7 Ra6-e6 Qc7-c8+ Kg8-g7 Qc8-a8 Re6-e1 h4-h5 g6xh5 Qa8-a6
12/28 06:43 4,385,894 10,870 +6,93 Qe5-c7 Ra6-e6 Qc7-c8+ Kg8-g7 Qc8-a8 Re6-e1 h4-h5 g6xh5 Qa8-a6
After 133.Qc7 following again Leela's recommendations for both players: 133...Re6 134.Qc8+ Kg7 135.Qa8 Re1 now want to play again 136.h5 even after 3.500.000 nodes and 190.000 TB hits but avoids it after 3.600.000 nodes and wants to play 136.Qf3 and then after 136...Re6 137.Qb7 Kg8 138.Qc7 Kg7 139.Qd8 Ra6 140.Qd4+ Kg8 And now there is nothing left than 2 moves for white to win since 146th move is approaching with 50 move rule draw. 141.Ke4 or 141.Ke5
Yet Leela after 4.200.000 nodes and 250.000 TB hits does not find either and wants to play Qd8+ that just draws! Is there any chance the TB implementation is broken? Maybe not since Stockfish has issues in recognizing this draw quickly and initially thought 141.Qd8+ wins also.
Lc0v17 11089:
12/27 05:18 4,121,467 12,931 +6,64 Qd4-d8+ Kg8-g7 Qd8-e8 Ra6-e6 Qe8-a8 Re6-e1 h4-h5 g6xh5 Qa8-a5 Re1-e6 Qa5-c3+ Kg7-g8 Qc3-c7 Re6-g6 Kf4-f5 Kg8-g7 Qc7-e5+ Kg7-h7 Qe5-e8 Kh7-g7 Qe8-e2```
No blunders anymore.
(part 1 was here)
This is to gather fresh examples of blunders (
./lc0
, on nets trained in July 2018 or later)Important!
When reporting positions to analyze, please use the following form. It makes it easier to see what's problematic with the position:
lc0
/lczero
version, operating system, and non-default parameters (number of threads, batch size, fpu reduction, etc).(old text below)
There are many reports on forums asking about blunders, and the answers so far had been something along the lines "it's fine, it will learn eventually, we don't know exactly why it happens".
I think at this point it makes sense to actually look into them to confirm that there no some blind spots in training. For that we need to:
--temperature=1.0 --noise
)" to see how training data would look like for this position.Eventually all of this would be nice to have as a single command, but we can start manually.
For
lc0
, that can be done this way:--verbose-move-stats -t 1 --minibatch-size=1 --no-smart-pruning
(unless you want to debug specifically with other settings).Then run UCI interface, do command:
(PGN move to UCI notation can be converted using
pgn-extract -Wuci
)Then do:
see results, add some more nodes by running:
And look how counters change.
Counters:
Help wanted: