LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.45k stars 529 forks source link

Pawn promotion issues in T40 #784

Closed killerducky closed 5 years ago

killerducky commented 5 years ago

T40 tends to put too much Policy on capture+promote (axb1=Q) and not enough Policy on simple push to promote (a1=Q, ignoring the possibility of capturing). Usually this is correct, but in some cases not capturing is correct. Other net series do not have this problem.

Here is an example position and Policy of the wrong move for several T40 nets. It quickly moves to near 100% even before the first LR drop.

https://lichess.org/7I0bVIJO#95

e2e4 e7e5 d2d4 e5d4 c2c3 d4c3 b1c3 f8c5 g1f3 d7d6 f1c4 g8f6 e1g1 e8g8 c1g5 b8d7 c3d5 h7h6 g5h4 g7g5 f3g5 f6d5 d1h5 h6g5 h4g5 d7f6 h5h4 d8d7 h2h3 f6h7 c4d5 c5d4 e4e5 d4e5 a1e1 f8e8 f2f4 e5d4 g1h1 c7c6 d5b3 e8e1 f1e1 d7f5 e1e8 h7f8 h4e1 f5g6 e1d2 g6b1 h1h2 b1b2 d2d3 d6d5 g5e7 b2f2 e8f8 g8g7 d3g3 f2g3 h2g3 d4f6 f8e8 f6e7 e8e7 a7a5 b3c2 d5d4 e7e8 b7b5 c2e4 c8b7 e8e7 a8b8 f4f5 b5b4 e7d7 a5a4 g3f4 b4b3 a2b3 a4a3 e4b1 c6c5 g2g4 c5c4 d7d4 c4b3 d4b4 a3a2 f5f6 g7f6 b4b6 f6g7 b6b3
looking for a2b1q (BAD MOVE!)

40100 (P: 43.05%) (Q:  0.77505) 
40200 (P: 21.60%) (Q:  0.82475) 
40220 (P: 38.10%) (Q:  0.57632) 
40240 (P: 24.23%) (Q:  0.59827) 
40260 (P: 46.59%) (Q:  0.70481) 
40280 (P: 65.92%) (Q:  0.68271) 
40300 (P: 77.25%) (Q:  0.63856) 
40320 (P: 94.75%) (Q:  0.45576) 
40340 (P: 97.63%) (Q:  0.45292) 
40360 (P: 89.38%) (Q: -0.29414) 
40380 (P: 93.77%) (Q:  0.39861) 
40400 (P: 99.93%) (Q:  0.28442) 
40500 (P: 100.00%) (Q:  0.75715) 
40600 (P: 100.00%) (Q:  0.43234) 
40700 (P: 99.95%) (Q:  0.27772) 
40800 (P: 100.00%) (Q:  0.40856) 
40900 (P: 100.00%) (Q:  0.37676) 
41000 (P: 100.00%) (Q:  0.15748) 
41100 (P: 100.00%) (Q:  0.24967) 
41200 (P: 99.95%) (Q:  0.18152) 
41300 (P: 99.95%) (Q:  0.33477) 
41400 (P: 99.88%) (Q:  0.35625)
killerducky commented 5 years ago

See below I parsed some sample pgns from training for T40, T30, and T50. For the case T40 has a choice of L+N or N+R, it picks N ~3.0-3.6% of the time. This is much lower than T30 and T50 which are ~17-31%.

Is 3.0-3.6% too low? Or it's not too low, but there are not enough examples in training for the net to care? Or the Policy head architecture makes it too hard to learn this?

================================================================================
pgns-run1-20190306-1254.tar.bz2 (41369)
================================================================================
Games     :  209418
Plies     :  24424424
Promotions:  54641

L = promote left legal
N = promote normal legal
R = promote right legal
G = move made in game

Trivial:
    L     N     R
False False False cases:     0
False False  True cases:   539
False  True False cases: 50269
 True False False cases:   742

    L     N     R G     %
False  True  True L   0.0 (    0/ 1474)
False  True  True N   3.6 (   53/ 1474)
False  True  True R  96.4 ( 1421/ 1474)

 True False  True L  12.5 (    7/   56)
 True False  True N   0.0 (    0/   56)
 True False  True R  87.5 (   49/   56)

 True  True False L  97.0 ( 1249/ 1287)
 True  True False N   3.0 (   38/ 1287)
 True  True False R   0.0 (    0/ 1287)

 True  True  True L  16.1 (   44/  274)
 True  True  True N   0.4 (    1/  274)
 True  True  True R  83.6 (  229/  274)

================================================================================
pgns-run2-20190128-0054.tar.bz2 (32930)
================================================================================
Games     :  125167
Plies     :  15575689
Promotions:  39250

L = promote left legal
N = promote normal legal
R = promote right legal
G = move made in game

Trivial:
    L     N     R
False False False cases:     0
False False  True cases:   270
False  True False cases: 36506
 True False False cases:   462

    L     N     R G     %
False  True  True L   0.0 (    0/  924)
False  True  True N  17.2 (  159/  924)
False  True  True R  82.8 (  765/  924)

 True False  True L  60.9 (   28/   46)
 True False  True N   0.0 (    0/   46)
 True False  True R  39.1 (   18/   46)

 True  True False L  82.8 (  722/  872)
 True  True False N  17.2 (  150/  872)
 True  True False R   0.0 (    0/  872)

 True  True  True L  63.5 (  108/  170)
 True  True  True N   0.6 (    1/  170)
 True  True  True R  35.9 (   61/  170)

================================================================================
pgns-run2-20190307-1254 (50381)
================================================================================
Games     :  400510
Plies     :  41370749
Promotions:  123508

L = promote left legal
N = promote normal legal
R = promote right legal
G = move made in game

Trivial:
    L     N     R
False False False cases:     0
False False  True cases:  1442
False  True False cases: 112949
 True False False cases:  1742

    L     N     R G     %
False  True  True L   0.0 (    0/ 2935)
False  True  True N  27.2 (  798/ 2935)
False  True  True R  72.8 ( 2137/ 2935)

 True False  True L  40.6 (   89/  219)
 True False  True N   0.0 (    0/  219)
 True False  True R  59.4 (  130/  219)

 True  True False L  68.6 ( 2253/ 3283)
 True  True False N  31.4 ( 1030/ 3283)
 True  True False R   0.0 (    0/ 3283)

 True  True  True L  35.5 (  333/  938)
 True  True  True N   9.0 (   84/  938)
 True  True  True R  55.5 (  521/  938)

================================================================================
pgns-run1-20190306-1254.tar.bz2 (41369)
================================================================================
Games     :  209418
Plies     :  24424424
Promotions:  54641

L = promote left (queenside) legal
N = promote normal legal
R = promote right (kingside) legal
G = move made in game

Trivial:
    L     N     R
False False False cases:     0
False False  True cases:   539
False  True False cases: 50269
 True False False cases:   742

    L     N     R G     %
False  True  True L   0.0 (    0/ 1474)
False  True  True N   3.6 (   53/ 1474)
False  True  True R  96.4 ( 1421/ 1474)

 True False  True L  12.5 (    7/   56)
 True False  True N   0.0 (    0/   56)
 True False  True R  87.5 (   49/   56)

 True  True False L  97.0 ( 1249/ 1287)
 True  True False N   3.0 (   38/ 1287)
 True  True False R   0.0 (    0/ 1287)

 True  True  True L  16.1 (   44/  274)
 True  True  True N   0.4 (    1/  274)
 True  True  True R  83.6 (  229/  274)

================================================================================
pgns-run2-20190128-0054.tar.bz2 (32930)
================================================================================
Games     :  125167
Plies     :  15575689
Promotions:  39250

L = promote left (queenside) legal
N = promote normal legal
R = promote right (kingside) legal
G = move made in game

Trivial:
    L     N     R
False False False cases:     0
False False  True cases:   270
False  True False cases: 36506
 True False False cases:   462

    L     N     R G     %
False  True  True L   0.0 (    0/  924)
False  True  True N  17.2 (  159/  924)
False  True  True R  82.8 (  765/  924)

 True False  True L  60.9 (   28/   46)
 True False  True N   0.0 (    0/   46)
 True False  True R  39.1 (   18/   46)

 True  True False L  82.8 (  722/  872)
 True  True False N  17.2 (  150/  872)
 True  True False R   0.0 (    0/  872)

 True  True  True L  63.5 (  108/  170)
 True  True  True N   0.6 (    1/  170)
 True  True  True R  35.9 (   61/  170)

================================================================================
pgns-run2-20190307-1254 (50381)
================================================================================
Games     :  400510
Plies     :  41370749
Promotions:  123508

L = promote left (queenside) legal
N = promote normal legal
R = promote right (kingside) legal
G = move made in game

Trivial:
    L     N     R
False False False cases:     0
False False  True cases:  1442
False  True False cases: 112949
 True False False cases:  1742

    L     N     R G     %
False  True  True L   0.0 (    0/ 2935)
False  True  True N  27.2 (  798/ 2935)
False  True  True R  72.8 ( 2137/ 2935)

 True False  True L  40.6 (   89/  219)
 True False  True N   0.0 (    0/  219)
 True False  True R  59.4 (  130/  219)

 True  True False L  68.6 ( 2253/ 3283)
 True  True False N  31.4 ( 1030/ 3283)
 True  True False R   0.0 (    0/ 3283)

 True  True  True L  35.5 (  333/  938)
 True  True  True N   9.0 (   84/  938)
 True  True  True R  55.5 (  521/  938)
Mardak commented 5 years ago

A different type but probably same underlying problem position in #750 is when promoting to queen is not the best move (in particular queen = stalemate, others winning):

position fen 5n2/4P2k/8/8/8/2K3Q1/8/8 w - - 0 1
looking for bad move e7f8q

40100 (P:  5.11%) 
40200 (P:  0.90%) 
40220 (P:  3.86%) 
40240 (P:  1.28%) 
40260 (P:  1.69%) 
40280 (P: 18.42%) 
40300 (P: 10.96%) 
40320 (P: 57.96%) 
40340 (P: 95.09%) 
40360 (P: 91.24%) 
40380 (P: 99.88%) 
40400 (P: 99.98%) 
40500 (P: 100.00%) 
40600 (P: 99.98%) 
40700 (P: 99.80%) 
40800 (P: 100.00%) 
40900 (P: 99.95%) 
41000 (P: 99.90%) 
41100 (P: 99.95%) 
41200 (P: 99.98%) 
41300 (P: 99.93%) 
41400 (P: 99.78%) 

stalemate

GeorgeMJ23 commented 5 years ago

Another example where Leela Lc0v0.21-RC2 41356 running on GTX 1070 Ti, threw a draw by playing 33.Qa8+??(in the following game) and she has played this move because she didn't even consider that (the obvious!) Queen promotion 35...e1Q wins in the following position: Q1bk1/5p1p/6p1/p4P2/8/7P/P2NpP2/2r2BK1 b - - 0 35

She should have considered the Queen promotion instantly as any other promotion with capture loses the Queen. It finds e1Q fast after 216000 nodes but she should have found the move after 1000 nodes or so, since remember that she played the 33.Qa8 move in the first place, because she thought 35...e1Q by black does not win.

Lc0v0.21-RC2 41356 analysis from PGN of the game:

 7/15    00:02     12,562 5,645    +63.81    e2xf1Q+ Nd2xf1 Rc1xf1+ Kg1xf1 a5-a4 f5xg6 h7xg6 Kf1-g2 Kg8-g7
 7/16    00:02     17,472 5,865    +64.73    e2xf1Q+ Nd2xf1 Rc1xf1+ Kg1xf1 a5-a4 f5xg6 h7xg6 h3-h4 Kg8-g7 h4-h5
 7/19    00:04     25,283 6,118    +65.45    e2xf1Q+ Nd2xf1 Rc1xf1+ Kg1xf1 a5-a4 f5xg6 h7xg6 h3-h4 Kg8-g7 h4-h5
 8/19    00:06     39,428 6,317    +61.75    e2xf1Q+ Nd2xf1 Rc1xf1+ Kg1xf1 a5-a4 f5xg6 h7xg6 Kf1-e2 Kg8-g7 Ke2-d3
 8/20    00:09     55,827 6,097    +60.75    e2xf1Q+ Nd2xf1 a5-a4 Kg1-g2 a4-a3 Nf1-e3 Rc1-a1 Ne3-g4 Ra1xa2 Ng4-f6+ 
 8/21    00:09     59,153 6,093    +60.75    e2xf1Q+ Nd2xf1 a5-a4 Kg1-g2 a4-a3 Nf1-e3 Rc1-a1 Ne3-g4 Ra1xa2 Ng4-f6+
 9/21    00:10     65,710 6,107    +60.75    e2xf1Q+ Nd2xf1 a5-a4 Kg1-g2 a4-a3 Nf1-e3 Rc1-a1 Ne3-g4 Ra1xa2 Ng4-f6+
 9/22    00:13     81,831 6,239    +60.75    e2xf1Q+ Nd2xf1 a5-a4 Kg1-g2 a4-a3 Nf1-e3 Rc1-a1 Ne3-g4 Ra1xa2 Ng4-f6+ 
 9/22    00:13     82,485 6,248    -0.35    Rc1xf1+ Nd2xf1 e2-e1Q Qd8-d5 a5-a4 Kg1-g2 Qe1-b4 Nf1-e3 Qb4-f4 f5xg6 
 9/23    00:14     89,034 6,302    -0.35    Rc1xf1+ Nd2xf1 e2-e1Q Qd8-d5 a5-a4 Kg1-g2 Qe1-b4 Nf1-e3 Qb4-f4 f5xg6 
 10/23    00:14     92,369 6,332    -0.35    Rc1xf1+ Nd2xf1 e2-e1Q Qd8-d5 a5-a4 Kg1-g2 Qe1-b4 Nf1-e3 Qb4-f4 f5xg6 
 10/23    00:19     129,769 6,623    -0.35    Rc1xf1+ Nd2xf1 e2-e1Q Qd8-d5 a5-a4 Kg1-g2 Qe1-b4 Nf1-e3 Qb4-f4 f5xg6
 10/23    00:24     156,331 6,354    -0.35    Rc1xf1+ Nd2xf1 e2-e1Q Qd8-d5 a5-a4 Kg1-g2 Qe1-b4 Nf1-e3 Qb4-f4 f5xg6 
 9/23    00:29     184,528 6,324    -0.35    Rc1xf1+ Nd2xf1 e2-e1Q Qd8-d5 a5-a4 Kg1-g2 Qe1-b4 Nf1-e3 Qb4-f4 f5xg6 
 9/23    00:34     216,918 6,361    -41.65    e2-e1Q Kg1-g2 Rc1-c2 f5xg6 h7xg6 Qd8xa5 Qe1xd2 Qa5xd2 Rc2xd2 Kg2-f3 
 9/23    00:39     250,954 6,417    -41.68    e2-e1Q f5xg6 h7xg6 Nd2-e4 Qe1xe4 Qd8xf8+ Kg8xf8 Kg1-h2 a5-a4 a2-a3 
 9/23    00:44     286,128 6,482    -41.99    e2-e1Q f5xg6 h7xg6 Nd2-e4 Qe1xe4 Qd8xf8+ Kg8xf8 Kg1-h2 a5-a4 a2-a3 
 9/23    00:49     322,057 6,547    -42.34    e2-e1Q f5xg6 h7xg6 Nd2-e4 Qe1xe4 Qd8xf8+ Kg8xf8 a2-a3 a5-a4 Kg1-h2 
 9/23    00:54     358,403 6,612    -42.38    e2-e1Q f5xg6 h7xg6 Qd8xa5 Rc1-d1 Qa5-d5 Rd1xd2 Qd5-f3 Bf8-c5 Qf3-a8+
 9/23    01:09     469,787 6,764    -42.16    e2-e1Q f5xg6 h7xg6 Qd8xa5 Rc1-d1 Qa5-d5 Rd1xd2 Qd5-f3 Bf8-c5 Qf3-a8+

PGN of the game:

[Event "Test v20.1 41356"]
[Site "Terminator"]
[Date "2019.03.08"]
[Round "23"]
[White "Lc0v0.21-RC2 41356"]
[Black "Texel 1.07"]
[Result "0-1"]
[BlackElo "2200"]
[ECO "D76"]
[Opening "Neo-Gr��ld, 6.cxd5 Nxd5 7.O-O Nb6 8.Nc3 Nc6 9.d5 Na5 10.e4 c6"]
[TimeControl "40/120:40/120:40/120"]

1. d4 Nf6 2. c4 g6 3. g3 Bg7 4. Bg2 O-O 5. Nf3 d5 6. cxd5 Nxd5 7. O-O Nb6
{-0.46/16 3} 8. Nc3 {+0.70/9 1} Nc6 {-0.49/17 3} 9. d5 {+0.65/10 2} Na5
{-0.43/16 3} 10. e4 {+0.66/10 2} c6 {-0.35/17 3} 11. Re1 {+0.70/9 2} Re8
{-0.44/16 5} 12. Bf4 {+0.66/10 6} cxd5 {-0.63/16 3} 13. exd5 {+0.55/11 2}
Nac4 {-0.44/17 4} 14. Qb3 {+0.52/10 2} Bf5 {-0.10/16 3} 15. h3 {+0.42/11 5}
Rc8 {+0.04/16 3} 16. Rac1 {+0.28/12 6} Na5 {0.00/15 6} 17. Qb5 {+0.22/11 2}
Nac4 {0.00/18 3} 18. Qb3 {+0.16/11 2} Na5 {0.00/18 3} 19. Qb5 {+0.09/9 2}
Nac4 {0.00/20 3} 20. Nd2 {+0.21/10 5} Nxb2 {+0.26/17 4} 21. Qxb2 {+0.15/14
0} Na4 {+0.25/17 3} 22. Qxb7 {+0.17/12 1} Nxc3 {+0.03/17 3} 23. Kh2
{+0.25/13 2} a5 {+0.13/16 3} 24. g4 {+0.36/12 4} e5 {+0.26/16 3} 25. gxf5
{+0.51/13 3} exf4 {+0.13/16 3} 26. Rxe8+ {+0.86/14 1} Qxe8 {0.00/18 3} 27.
Re1 {+1.66/15 4} Qf8 {+0.40/17 3} 28. d6 {+1.95/13 3} f3 {+0.96/16 3} 29.
d7 {+4.91/13 4} Qd6+ {+0.91/18 3} 30. Kg1 {+5.44/13 2} Ne2+ {+0.92/18 3}
31. Rxe2 {+16.91/11 4} Rc1+ {+0.79/18 3} 32. Bf1 {+19.66/11 7} fxe2
{+0.82/18 3} 33. Qa8+ {+22.10/11 1} Bf8 {+5.08/18 2} 34. d8=Q {+29.68/9 2}
Qxd8 {+5.10/19 3} 35. Qxd8 {+56.92/8 4} e1=Q {+5.10/20 3} 36. Kg2 {-38.84/6
8} Rc2 {+11.26/15 3} 37. fxg6 {-33.46/6 2} hxg6 {+14.43/16 3} 38. Qxa5
{-47.82/6 4} Bc5 {+18.92/16 3} 0-1
killerducky commented 5 years ago

BTW here is my code, instructions not included. :) https://github.com/killerducky/lc0_analyzer/tree/extras pgn_analyzer.py stats.py

killerducky commented 5 years ago

Is 3.0-3.6% too low? Or it's not too low, but there are not enough examples in training for the net to care? Or the Policy head architecture makes it too hard to learn this?

Plies : 24424424 Interesting cases: 1474+1287+274 (ignore 56 N not possible cases) Rate: 1 interesting case per 8047 plies

I think 1 in 8047 is not so rare it can't learn? But probably many of these cases are still "not interesting" because e.g. the game is over and it doesn't matter. But it requires more parsing of the position and maybe feeding to Stockfish to find cases where the decision really matters. Or taking these examples and sending them to the current NN to see if it finds those specific examples but doesn't generalize to unseen examples.

mattblachess commented 5 years ago

5 examples of the promotion bug can be found in the attached PGN.

T40_promotion_blunders.zip

For example, in game 5:

E.g.: 4r3/1kp4p/1n1prPp1/p5P1/2PRN3/P2Q4/K2R4/6q1 b - - ...Qh1??? is played by vanilla leela T40 it cannot see the f7 idea this is even clearer.. further along.. 4r3/1kp2P1p/1n1p2p1/p5P1/2P1q3/P2Q4/K2R4/8 w - - 0 4 (Diagram position below) In this position leela expects f7xe8 afterwhich Qxe8 is fairly even position. But SF plays the simple QxQ RxQ f8=Q and boom! Instant game over. Didn't see it coming until after it was played.

image

In the 4th game of the PGN, as early as move 13 leela goes wrong. After 13.h5 Leela plays 13. b6??? b/c she doesn't even see the promotion idea. SF immediately jumps eval and wins soon after. Leela doesn't realize the blunder - at all - until the queen is actually made! E.g., in the diagram below, SF plays 15. Qxg8+ and is at +5 and - but leela thinks -17 after Nxg8! Game continues Nxg8 h7 Nce7 h8=Q and only now, after the queen is made, does she jump to +14. She doesn't even consider the promotion until after Queen is made.

PromotionBlunder

FEN of the position after 16. h7:
r3k1n1/p1q2p1P/bpn1p3/4P3/3p1P2/P1p5/2P1N1P1/R1B1KB1R b KQq - 0 16

killerducky commented 5 years ago

I've modified 41452 by reducing Policy channel 33. The nets are called hack2, hack4, hack8, for channel 33 being reduced 2X, 4X, 8X. I tested the position in the OP:

                     good_move        bad_move
41452             a2a1q (P:  0.01%) a2b1q (P: 93.63%)
41452_hack2.pb.gz a2a1q (P:  1.63%) a2b1q (P: 77.44%)
41452_hack4.pb.gz a2a1q (P:  9.74%) a2b1q (P: 34.81%)
41452_hack8.pb.gz a2a1q (P: 14.92%) a2b1q (P: 14.53%)

By the time you get to 8X, the capture-promote move is down to 14.9%. I didn't list other moves, but they go up as well. So 8X might cause a too-wide search. Maybe 4X is best.

See below channel 33 is abnormally large output.

aolsen after policy batchnorm
channel:33
  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000  0.00000
 359.22580  0.00000 25.37879  0.00000  0.00000  0.00000  0.00000  0.00000

The Policy FC layer uses the huge 359 number at a1 to increase Policy of a2b1q and reduce Policy of a2a1q:

i:2168 o:1792 idx:9177208 in[i]:359.226 w[idx]:-0.00490484 mult:-1.76194
i:2168 o:1795 idx:9192568 in[i]:359.226 w[idx]:0.0530997 mult:19.0748

Final result:

aolsen policy raw:
a2a1q id:1792 13.0735
a2b1q id:1795 29.8254
Policy after softmax=1:
1792 5.29877e-08
1795 0.998665

Hacked nets here: http://data.lczero.org/files/41452_hack2.pb.gz http://data.lczero.org/files/41452_hack4.pb.gz http://data.lczero.org/files/41452_hack8.pb.gz

Technical implementation note: Instead of dividing gamma for channel 33, I divided all weights in the FC layer that take channel 33 as an input. Just because I had code to modify the FC layer but not the gamma layer.

oscardssmith commented 5 years ago

what do the elo tests say?

mattblachess commented 5 years ago

I tested all 5 of the promo bug losses in my PGN against the "hack8" version - leela saw all the right moves and didn't make the same mistakes. I.e., the fix worked.

I didn't test hack2 or hack4 but jhorthos tested all 3 against base and hack8 scored the highest elo.

I will now test 41452-hack8 in my standard 100 game rapid match and compare against the bugged 41452 result. The bugged version lost at least 4 games to the promo bug - so if its fixed without negatively impacting elsewhere, hack8 should score better. Will report back.

Alexandros2 commented 5 years ago

Finished the comparison of 41452_hack8 to regular 41452, both against SF_10:

RTX 2080 & 6 core i7 6800k TC: 15s + 0.25s

Score of lc0_41452_h8 vs SF_10: +35 -15 =50 [0.600] 100 Elo difference: 70.44 +/- 48.39

Score of lc0_41452 vs SF_10: +28 -20 =52 [0.540] 100 Elo difference: 27.85 +/- 46.71

Some 43 Elo points improvement from hack8. Error margins are still large, but the result is much outside one standard deviation difference.

mattblachess commented 5 years ago

Final score of the non-hack version: Lc0.21.0-rc2.41452 - Stockfish_10_x64_bmi2 : 47.0/100 12-18-70 (==00====10==1====01===10===0========1===1==0======0=1====1===0=====010===00=========01===00===0101==) 47% -> ordo score: 3445

Of these, EIGHT losses were from the promo bug. see attached PGN for all 8 losses.

Now re-running the match with the hack8 version T40_41452_promotion_blunders.zip

killerducky commented 5 years ago

Quoting jio aka ttl from Discord:

I tried training T40 net on T40 data with batch renormalization, without virtual_batch_size, gamma reguralization and some other small changes and it looks like it doesn't give super high policy for capture promote anymore I initialized from 41498 and trained few steps on few days old t40 data anyway here is the net: http://hforsten.com/leelaz/t40-renorm-swa-32000.pb.gz I wouldn't be too suprised if it wasn't stronger due to the training data being older It's about even with 41498 at 800 nodes so far in my quick tests

mattblachess commented 5 years ago

Test complete - final score:

Lc0.21.0-rc2.pr784.hack8.41452 - Stockfish_10_x64_bmi2 : 48.5/100 19-22-59 (=====0=110=0=====0=0==0=001=====01==101=0==0==0===111===1111=0==1=1======0=0100=100==0======10====1=) 49% -> 3455 ordo score

And improvement of "1.5" points / 100. Not overly significant BUT there were no promo blunders and the score is better, so at the least I don't think the change hurts the net. Others show similar results.

Ttl commented 5 years ago

I did some supervised tests starting from 41498 network and training from the latest T40 data to check how this issue could be solved.

The base 41498 network had 99.8% policy for capture on the position in the first post with --policy-softmax-temp=1

Training 500 steps with the default training parameters resulted in a net with the same 99.8% policy. Training with default parameters, but with gamma regularization from the same net had the same 99.8% policy in the test position at 500 steps.

Removing the virtual_batch_size and enabling batch renormalization with 'rmin': 0.25, 'rmax': 4.0, 'dmax': 5.0 and renorm_momentum=0.99 results in 96.6% after 250 steps and 94.8% policy in the test position after 750 steps.

The reason I suspect issue with batch norm statistics more than issue with batch norm gammas is that SE-units are after batch normalization and they have ability to zero any of the output channels. In training batch norm uses statistics from the current batch to normalize it, while in testing/play it uses moving averages of the training statistics. Currently the training code calculates the batch norm statistics using ghost batch norm of 64 positions. If one channel is often zeroed out it will be normalized to batch statistic during training, but in testing/play moving averages are used instead. If the channel is rarely activated the moving variance will be very low since the output is often zeroed and it will be multiplied with very large value in testing/play to get it to unit variance which causes the observed issue.

killerducky commented 5 years ago

For reference, the Batch Renormalization paper: https://arxiv.org/abs/1702.03275 And T40 is using the parameters Ttl posted as of net 41546. The very first net had significant change in Policy for the test positions.

Mardak commented 5 years ago

Looks like something good got in the training window (or something got pushed out?)… starting from 41581 now has the top prior move as the winning move and keeps increasing for 41590 but slows down for 41600… and keeps going!

Screen Shot 2019-03-22 at 1 05 02 PM

killerducky commented 5 years ago

Things are going in the right direction.