LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.41k stars 526 forks source link

scores at root (and possibly further down the tree) are corrupted during search (0.31 rc1) #2001

Closed Videodr0me closed 5 months ago

Videodr0me commented 6 months ago

I noticed this in multi pv mode, but it might also happen in normal mode.

This search starts out normally:

       _
|   _ | |
|_ |_ |_| v0.31.0-rc1 built Mar 25 2024
setoption name MultiPV value 40
position fen r4k1r/pp1nqp1b/3p1n1p/2pP2p1/2P1p2P/2P1P1B1/P1QNBPP1/2KR3R w - - 0 1
go infinite
Found pb network file: C:\My_Programs\chess\lc0-v0.31dag/BT4-1024x15x32h-swa-6147500.pb.gz
Weights file has multihead format, updating format flag
Creating backend [cuda-auto]...
Switching to [cuda-fp16]...
CUDA Runtime version: 11.1.0
Latest version of CUDA supported by the driver: 12.4.0
GPU: NVIDIA GeForce RTX 4090
GPU memory: 23.9877 Gb
GPU clock frequency: 2565 MHz
GPU compute capability: 8.9
L2 cache capacity: 75497472
info depth 1 seldepth 2 time 3274 nodes 2 score cp -18 nps 285 tbhits 0 multipv 1 pv h4g5 h6g5
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 2 pv c2b3
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 3 pv c1b2
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 4 pv c2b2
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 5 pv d1g1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 6 pv c2b1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 7 pv h1h2
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 8 pv d1f1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 9 pv d1e1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 10 pv c1b1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 11 pv d2f1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 12 pv c2a4
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 13 pv a2a4
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 14 pv g3h2
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 15 pv d2b3
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 16 pv h1h3
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 17 pv d2b1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 18 pv a2a3
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 19 pv h1g1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 20 pv h1f1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 21 pv h1e1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 22 pv f2f4
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 23 pv h4h5
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 24 pv e2f1
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 25 pv f2f3
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 26 pv e2g4
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 27 pv e2h5
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 28 pv c2d3
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 29 pv e2d3
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 30 pv c2e4
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 31 pv g3d6
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 32 pv g3f4
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 33 pv g3e5
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 34 pv d2e4
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 35 pv e2f3
info depth 1 seldepth 2 time 3274 nodes 2 score cp -24 nps 285 tbhits 0 multipv 36 pv d2f3

But later the score for the top move gets corrupted (here after about 5-6 minutes) below the last correct iteration:

info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -11 nps 18570 tbhits 0 multipv 1 pv d1e1 a8b8 c1b2 f8g7 b2a1 h7g6 c2b3 d7e5 f2f4 e5d3 e1f1 g5f4 g3f4 d3f4 f1f4 h6h5 b3c2 f6d7 g2g4 h5g4 e2g4 d7e5 g4f5 g6f5 f4f5 h8h4 h1f1 b8h8 d2e4 h8h6 e4g3 h4h3 f5e5 e7e5 g3f5 g7f8 f5h6 h3h6 a1b2 h6h2 f1f2 h2f2 c2f2 a7a6 a2a4 a6a5 f2h4 f8e8 e3e4 e5g7 h4h3 e8d8
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -25 nps 18570 tbhits 0 multipv 2 pv c1b2 f8g7 d1e1 h7g6 c2d1 a7a6 h4h5 g6h7 f2f4 h8b8 e1f1 g7h8 f1f2 f6g8 e2g4 b7b5 h1f1 f7f5 f4g5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -41 nps 18570 tbhits 0 multipv 3 pv h4g5 h6g5 h1h2 f8g7 d1h1 h7g6 h2h8 a8h8 h1h8 g7h8 c2a4 a7a6 a4b3 d7e5 b3b6 f6e8 g3e5 e7e5 b6b7 h8g7 c1d1 e5c3 g2g4 e8f6 d2b1 c3a1 a2a3 f6d7 d1c1 d7e5 b7b2 a1b2 c1b2 f7f5 g4f5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -36 nps 18570 tbhits 0 multipv 4 pv c2b3 f8g7 d1e1 d7e5 f2f3 h7g6 b3d1 e4f3 g2f3 e5g4 e3e4 g4e3 d1b3 f6h5 g3h2 e3g2 e1g1 g2f4 e2f1 f7f6 b3c2 a7a6 d2b3 a8f8 c1b2 f6f5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -30 nps 18570 tbhits 0 multipv 5 pv d1g1 f8g7 c1b2 a8e8 c2d1 h7g6 h1h2 d7e5 f2f4 e5d3 e2d3 e4d3 f4g5 h6g5 h4g5 f6e4 h2h8 g7h8 d2e4 e7e4 d1f3 e4f3 g2f3 e8e3 g1g2 e3f3 g3d6
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -38 nps 18570 tbhits 0 multipv 6 pv c2b2 b7b6 f2f3 f6h5 g3f2 e4f3 g2f3 d7e5 d1g1 h7d3 e2d1 d3c4 d1c2 c4d5 h4g5 h6g5 c3c4 d5c6 h1h3 f7f6 g1h1 e7f7 f2g3 a8e8 g3e5 e8e5 b2c3 h8h6 c2d1 e5e7 d2e4
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -37 nps 18570 tbhits 0 multipv 7 pv d1f1 f8g7 c1b2 h7g6 b2a1 a8b8 c2c1 b8e8 f1e1 d7e5 f2f3 f6h5 g3h2 f7f5 f3e4 f5e4 e2h5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -40 nps 18570 tbhits 0 multipv 8 pv c2b1 f8g7 c1b2 h7g6 d1e1 a8e8 b1d1 a7a6 b2a1 e8b8 d1c1 f6h5 e2h5 g6h5 f2f3 h5g6 h4g5 h6g5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -41 nps 18570 tbhits 0 multipv 9 pv h1h2 f8g7 h4g5 h6g5 d1h1 d7e5 c2a4 b7b6 a4d1 h7g6 d1g1 e5g4 h2h8 a8h8 h1h8 g7h8 c1b2 h8g7 b2b3 g4h6 f2f4 e4f3 g2f3 f6h5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -33 nps 18570 tbhits 0 multipv 10 pv c1b1 f8g7 b1a1 h7g6 d1e1 a8e8 c2a4 g5g4 e1b1 d7e5 a4a7 f6h5 b1b7 e7f6 g3e5 e8e5 e2g4 e5e7
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -44 nps 18570 tbhits 0 multipv 11 pv c2a4 f8g7 c1b2 d7e5 b2a1 a7a6 a4c2 h7g6 d1b1 a8b8 b1b3 b7b6 c2b1 f6d7 b3b2 f7f5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -64 nps 18570 tbhits 0 multipv 12 pv d2f1 f8g7 c1b2 d7e5 b2a1 h7g6 c2d2 a7a6 f2f3 e4f3 g2f3 g5g4 f3g4 f6e4 d2e1 b7b5 g3f4 b5c4
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -51 nps 18570 tbhits 0 multipv 13 pv a2a4 f8g7 c1b2 h7g6 d1a1 d7e5 c2d1 a7a6 f2f4 e4f3 g2f3 b7b5 c4b5 a6b5 h4g5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -62 nps 18570 tbhits 0 multipv 14 pv d2b3 f8g7 b3a5 a8b8 c1b2 d7e5 b2a1 g5g4 d1b1 e7c7 a5b3 h8e8 h4h5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -69 nps 18570 tbhits 0 multipv 15 pv g3h2 f8g7 c1b2 d7e5 c2c1 g5g4 h4h5 a7a6 h2g3 b7b5 b2a1 h8b8 g3h4
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -71 nps 18570 tbhits 0 multipv 16 pv h1h3 g5g4 h3h1 f6h5 g3h2 h8g8 d2f1 d7e5 c1b2 a8e8 c2d2 e7f6 f1g3 h5g7 d1f1 h6h5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -60 nps 18570 tbhits 0 multipv 17 pv a2a3 f8g7 c1b2 d7e5 d1b1 h7g6 c2d1 b7b6 a3a4 a7a6 h4g5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -73 nps 18570 tbhits 0 multipv 18 pv d2b1 f8g7 c1b2 d7e5 b1d2 h7g6 d1b1 g5g4 b2a1 b7b6 h4h5 f6h5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -67 nps 18570 tbhits 0 multipv 19 pv h1g1 f8g7 c1b2 h7g6 g1h1 d7e5 d1b1 g5g4 h4h5 f6h5 g3e5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -68 nps 18570 tbhits 0 multipv 20 pv h1f1 f8g7 c1b2 d7e5 f1h1 h7g6 d1b1 g5g4 h4h5 f6h5 g3e5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -73 nps 18570 tbhits 0 multipv 21 pv h1e1 f8g7 c1b2 d7e5 e1h1 h7g6 d1b1 g5g4 h4h5 f6h5 g3e5
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -78 nps 18570 tbhits 0 multipv 22 pv e2f1 f6h5 g3h2 g5g4 f1e2 h8g8 d2f1 d7e5 c1b2 e5d3 d1d3 e4d3 e2d3 h7d3 c2d3 e7f6
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -142 nps 18570 tbhits 0 multipv 23 pv h4h5 f8g7 c1b2 d7e5 b2a1 h8e8 d1b1 b7b6 b1b2 f6g4 c2d1
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -272 nps 18570 tbhits 0 multipv 24 pv f2f4 e4f3 e2d3 f3g2 h1g1 g5h4 g3h4 h7d3 c2d3 h8g8 e3e4 e7e5 d3f3 g8g4 h4f6 d7f6 g1g2 g4g2 f3g2 e5c3 c1b1 f8e7 g2f2 c3d3
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -250 nps 18570 tbhits 0 multipv 25 pv f2f3 e4f3 e2d3 f3g2 h1g1 g5h4 g3h4 h7d3 c2d3 h8g8 e3e4 e7e5 d3f3 g8g4 h4f6 d7f6 g1g2 g4g2 f3g2 e5c3
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -2905 nps 18570 tbhits 0 multipv 26 pv e2g4 f6g4 h4g5 d7e5 d2e4 e5c4 h1h4 h7e4 c2e2 g4e3 f2e3
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -5276 nps 18570 tbhits 0 multipv 27 pv e2h5 f6h5 g3h2 g5g4 d2f1 a7a6 f1g3 h5f6 g3e2
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -4143 nps 18570 tbhits 0 multipv 28 pv e2d3 e4d3 c2b2 f6h5 g3h2 g5g4 e3e4 h8g8 d2f1 a8e8
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -10692 nps 18570 tbhits 0 multipv 29 pv c2d3 e4d3 e2d3 h7d3 d2b3 e7e4 b3d2 e4g4
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -9519 nps 18570 tbhits 0 multipv 30 pv c2e4 f6e4 d2e4 h7e4 h4g5 b7b5 c4b5 a7a6
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -3961 nps 18570 tbhits 0 multipv 31 pv g3d6 e7d6 h4g5 h6g5 d2e4 f6e4 e2d3 d7f6 d3e4 f6e4
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -4329 nps 18570 tbhits 0 multipv 32 pv g3e5 d7e5 h4g5 h6g5 d2e4 f6e4 h1h7 h8h7 c2e4
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -6527 nps 18570 tbhits 0 multipv 33 pv g3f4 g5f4 e3f4 e4e3 e2d3 e3d2 d1d2 h7d3 c2d3
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -4008 nps 18570 tbhits 0 multipv 34 pv d2e4 f6e4 e2d3 e4g3 f2g3 h7d3 c2d3 d7e5 d3f5 f8g7 d1f1
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -5693 nps 18570 tbhits 0 multipv 35 pv e2f3 e4f3 e3e4 f3g2 h1g1 f6e4 d1e1 e4g3 e1e7 h7c2 e7e1 g5h4 g1g2
info depth 19 seldepth 68 time 489077 nodes 9021724 score cp -3716 nps 18570 tbhits 0 multipv 36 pv d2f3 e4f3 e2d3 f3g2 h1h2 h7d3 c2d3 g5g4 d3f5 b7b5 c4b5

Now the corruptions starts: a completely nonsensical cp -2147483648 appears out of nowhere.

info depth 19 seldepth 68 time 494097 nodes 9095507 score cp -2147483648 nps 18531 tbhits 0 multipv 1 pv d1e1 a8b8 c1b2 f8g7 b2a1 h7g6 c2b3 d7e5 f2f4 e5d3 e1f1 g5f4 g3f4 d3f4 f1f4 h6h5 b3c2 f6d7 g2g4 h5g4 e2g4 d7e5 g4f5 g6f5 f4f5 h8h4 h1f1 b8h8 d2e4 h8h6 e4g3 h4h3 f5e5 e7e5 g3f5 g7f8 f5h6 h3h6 a1b2 h6h2 f1f2 h2f2 c2f2 a7a6 a2a4 a6a5 f2h4 f8e8 e3e4 e5g7 h4h3 e8d8
info depth 19 seldepth 68 time 494097 nodes 9095507 score cp -25 nps 18531 tbhits 0 multipv 2 pv c1b2 f8g7 d1e1 h7g6 c2d1 a7a6 h4h5 g6h7 f2f4 h8b8 e1f1 g7h8 f1f2 f6g8 e2g4 b7b5 h1f1 f7f5 f4g5
info depth 19 seldepth 68 time 494097 nodes 9095507 score cp -42 nps 18531 tbhits 0 multipv 3 pv h4g5 h6g5 h1h2 f8g7 d1h1 h7g6 h2h8 a8h8 h1h8 g7h8 c2a4 a7a6 a4b3 d7e5 b3b6 f6e8 g3e5 e7e5 b6b7 h8g7 c1d1 e5c3 g2g4 e8f6 d2b1 c3a1 a2a3 f6d7 d1c1 d7e5 b7b2 a1b2 c1b2 f7f5 g4f5
info depth 19 seldepth 68 time 494097 nodes 9095507 score cp -33 nps 18531 tbhits 0 multipv 4 pv c2b3 f8g7 d1e1 d7e5 f2f3 h7g6 b3d1 e4f3 g2f3 e5g4 e3e4 g4e3 d1b3 f6h5 g3h2 e3g2 e1g1 g2f4 e2f1 f7f6 c1b2 a8b8 b2a1 b7b6 b3c2 a7a6
info depth 19 seldepth 68 time 494097 nodes 9095507 score cp -30 nps 18531 tbhits 0 multipv 5 pv d1g1 f8g7 c1b2 a8e8 c2d1 h7g6 h1h2 d7e5 f2f4 e5d3 e2d3 e4d3 f4g5 h6g5 h4g5 f6e4 h2h8 g7h8 d2e4 e7e4 d1f3 e4f3 g2f3 e8e3 g1g2 e3f3 g3d6
info depth 19 seldepth 68 time 494097 nodes 9095507 score cp -39 nps 18531 tbhits 0 multipv 6 pv c2b2 b7b6 f2f3 f6h5 g3f2 e4f3 g2f3 d7e5 d1g1 h7d3 e2d1 d3c4 d1c2 c4d5 h4g5 h6g5 c3c4 d5c6 h1h3 f7f6 g1h1 e7f7 f2g3 a8e8 g3e5 e8e5 b2c3 h8h6 c2d1 e5e7 d2e4

This happens not only in this position but in others as well mostly after searching a while. Any ideas?

mooskagh commented 6 months ago

Thanks for the report!
It was also reported here https://github.com/LeelaChessZero/lc0/discussions/2000#discussioncomment-8925045 earlier today but I wasn't sure whether that's a UI issue (suspected to be Lc0 though).

Hmm I thought we always output wdl, maybe it has to be enabled through a param. It would be interesting to take a look what they look like.

Videodr0me commented 6 months ago

Update: Also happens in single PV mode. And this is not the only position, I noticed many more, seems to be a general problem.

       _
|   _ | |
|_ |_ |_| v0.31.0-rc1 built Mar 25 2024
position fen r4k1r/pp1nqp1b/3p1n1p/2pP2p1/2P1p2P/2P1P1B1/P1QNBPP1/2KR3R w - - 0 1
go infinite
Found pb network file: C:\My_Programs\chess\lc0-v0.31dag/BT4-1024x15x32h-swa-6147500.pb.gz
Weights file has multihead format, updating format flag
Creating backend [cuda-auto]...
Switching to [cuda-fp16]...
CUDA Runtime version: 11.1.0
Latest version of CUDA supported by the driver: 12.4.0
GPU: NVIDIA GeForce RTX 4090
GPU memory: 23.9877 Gb
GPU clock frequency: 2565 MHz
GPU compute capability: 8.9
L2 cache capacity: 75497472
info depth 1 seldepth 2 time 3237 nodes 2 score cp -18 nps 285 tbhits 0 pv h4g5 h6g5
info depth 2 seldepth 3 time 3241 nodes 3 score cp -28 nps 272 tbhits 0 pv h4g5 h6g5 h1h3
info depth 2 seldepth 4 time 3245 nodes 4 score cp -28 nps 266 tbhits 0 pv h4g5 h6g5 h1h3 f8g7

....

info depth 19 seldepth 67 time 418682 nodes 9911495 score cp -13 nps 23857 tbhits 0 pv d1e1 a8b8 c1b2 f8g7 b2a1 h7g6 c2b2 f6h5 g3h2 a7a6 f2f4 e4f3 g2f3 g5h4 e3e4 h5g3 h2g3 h4g3 e1g1 e7g5 h1h3 g5e3 g1g2 g7f8 h3g3 h6h5 g3h3 f8e7 d2f1 e3h6 b2d2 f7f5 h3g3 h6d2 f1d2 d7e5 f3f4 e5g4 e4e5 d6e5 f4e5 h5h4 g3h3 g4e5 d2f3 e5f3 g2g6
info depth 19 seldepth 67 time 423684 nodes 9998321 score cp -13 nps 23779 tbhits 0 pv d1e1 a8b8 c1b2 f8g7 b2a1 h7g6 c2b2 f6h5 g3h2 a7a6 f2f4 e4f3 g2f3 g5h4 e3e4 h5g3 h2g3 h4g3 e1g1 e7g5 h1h3 g5e3 g1g2 g7f8 h3g3 h6h5 g3h3 f8e7 d2f1 e3h6 b2d2 f7f5 h3g3 h6d2 f1d2 d7e5 f3f4 e5g4 e4e5 d6e5 f4e5 h5h4 g3h3 g4e5 d2f3 e5f3 g2g6
info depth 19 seldepth 67 time 428690 nodes 10116946 score cp -2147483648 nps 23778 tbhits 0 pv d1e1 a8b8 c1b2 f8g7 b2a1 h7g6 c2b2 f6h5 g3h2 a7a6 f2f4 e4f3 g2f3 g5h4 e3e4 h5g3 h2g3 h4g3 e1g1 e7g5 h1h3 g5e3 g1g2 g7f8 h3g3 h6h5 g3h3 f8e7 d2f1 e3h6 b2d2 f7f5 h3g3 h6d2 f1d2 d7e5 f3f4 e5g4 e4e5 d6e5 f4e5 h5h4 g3h3 g4e5 d2f3 e5f3 g2g6
info depth 19 seldepth 67 time 433691 nodes 10213750 score cp -2147483648 nps 23727 tbhits 0 pv d1e1 a8b8 c1b2 f8g7 b2a1 h7g6 c2b2 f6h5 g3h2 a7a6 f2f4 e4f3 g2f3 g5h4 e3e4 h5g3 h2g3 h4g3 e1g1 e7g5 h1h3 g5e3 g1g2 g7f8 h3g3 h6h5 g3h3 f8e7 d2f1 e3h6 b2d2 f7f5 h3g3 h6d2 f1d2 d7e5 f3f4 e5g4 e4e5 d6e5 f4e5 h5h4 g3h3 g4e5 d2f3 e5f3 g2g6
borg323 commented 6 months ago

Can you show the --show-wdl output when it happens?

Videodr0me commented 6 months ago

Here the same pos with WDL output:

       _
|   _ | |
|_ |_ |_| v0.31.0-rc1 built Mar 25 2024
setoption name UCI_ShowWDL value true
position fen r4k1r/pp1nqp1b/3p1n1p/2pP2p1/2P1p2P/2P1P1B1/P1QNBPP1/2KR3R w - - 0 1
go infinite
Found pb network file: C:\My_Programs\chess\lc0-v0.31dag/BT4-1024x15x32h-swa-6147500.pb.gz
Weights file has multihead format, updating format flag
Creating backend [cuda-auto]...
Switching to [cuda-fp16]...
CUDA Runtime version: 11.1.0
Latest version of CUDA supported by the driver: 12.4.0
GPU: NVIDIA GeForce RTX 4090
GPU memory: 23.9877 Gb
GPU clock frequency: 2565 MHz
GPU compute capability: 8.9
L2 cache capacity: 75497472
info depth 1 seldepth 2 time 3303 nodes 2 score cp -18 wdl 195 533 272 nps 285 tbhits 0 pv h4g5 h6g5
info depth 2 seldepth 3 time 3307 nodes 3 score cp -28 wdl 178 524 298 nps 272 tbhits 0 pv h4g5 h6g5 h1h3

This time lc0 picked c1b2 as best, which is only a transposition to the d1e1 line. But it took longer for the bug to appear. The good news is, it seems absolutely reproducible.

info depth 23 seldepth 68 time 827371 nodes 12395901 score cp -10 wdl 208 540 252 nps 15042 tbhits 0 pv c1b2 f8g7 d1e1 a8b8 b2a1 h7g6 c2c1 f6h5 g3h2 g5h4 e2h5 g6h5 f2f3 h5g6 d2e4 g6e4 f3e4 b7b5 c1d1 b5b4 d1g4 g7f8 h2f4 d7e5 g4h4 e7h4 h1h4 e5c4 a1b1 f8e7 b1c2 c4a3 c2d2 a7a5 e1h1 b8g8 g2g3 g8g6 d2e2 e7d7 c3b4 a5b4 e4e5 h8a8 e5e6 f7e6 h4h6 g6h6 h1h6 e6e5 h6h7 d7c8 h7h8 c8b7 h8a8 b7a8 f4h6 c5c4 e3e4 a8b7 g3g4 b7c7 e2d1 c7d7
info depth 23 seldepth 68 time 832389 nodes 12462694 score cp -10 wdl 208 541 251 nps 15031 tbhits 0 pv c1b2 f8g7 d1e1 a8b8 b2a1 h7g6 c2c1 f6h5 g3h2 g5h4 e2h5 g6h5 f2f3 h5g6 d2e4 g6e4 f3e4 b7b5 c1d1 b5b4 d1g4 g7f8 h2f4 d7e5 g4h4 e7h4 h1h4 e5c4 a1b1 f8e7 b1c2 c4a3 c2d2 a7a5 e1h1 b8g8 g2g3 g8g6 d2e2 e7d7 c3b4 a5b4 e4e5 h8a8 e5e6 f7e6 h4h6 g6h6 h1h6 e6e5 h6h7 d7c8 h7h8 c8b7 h8a8 b7a8 f4h6 c5c4 e3e4 a8b7 g3g4 b7c7 e2d1 c7d7
info depth 23 seldepth 68 time 837421 nodes 12536003 score cp -2147483648 wdl 0 1000 0 nps 15028 tbhits 0 pv c1b2 f8g7 d1e1 a8b8 b2a1 h7g6 c2c1 f6h5 g3h2 g5h4 e2h5 g6h5 f2f3 h5g6 d2e4 g6e4 f3e4 b7b5 c1d1 b5b4 d1g4 g7f8 h2f4 d7e5 g4h4 e7h4 h1h4 e5c4 a1b1 f8e7 b1c2 c4a3 c2d2 a7a5 e1h1 b8g8 g2g3 g8g6 d2e2 e7d7 c3b4 a5b4 e4e5 h8a8 e5e6 f7e6 h4h6 g6h6 h1h6 e6e5 h6h7 d7c8 h7h8 c8b7 h8a8 b7a8 f4h6 c5c4 e3e4 a8b7 g3g4 b7c7 e2d1 c7d7
info depth 23 seldepth 68 time 842436 nodes 12608042 score cp -2147483648 wdl 0 1000 0 nps 15024 tbhits 0 pv c1b2 f8g7 d1e1 a8b8 b2a1 h7g6 c2c1 f6h5 g3h2 g5h4 e2h5 g6h5 f2f3 h5g6 d2e4 g6e4 f3e4 b7b5 c1d1 b5b4 d1g4 g7f8 h2f4 d7e5 g4h4 e7h4 h1h4 e5c4 a1b1 f8e7 b1c2 c4a3 c2d2 a7a5 e1h1 b8g8 g2g3 g8g6 d2e2 e7d7 c3b4 a5b4 e4e5 h8a8 e5e6 f7e6 h4h6 g6h6 h1h6 e6e5 h6h7 d7c8 h7h8 c8b7 h8a8 b7a8 f4h6 c5c4 e3e4 a8b7 g3g4 b7c7 e2d1 c7d7

Curiously, the WDL score is 0 1000 0..... Strange! Could be two bugs: One that converts that to an insane score, and another that sets the score to 0 1000 0 in the first place.

borg323 commented 6 months ago

What about the output with --verbose-move-stats is you stop the search right after the issue happens?

gsobala commented 6 months ago

I get the problem on an RTX3080 under WSL2 compiled with and using CUDA 12.4

(venv) george@FALCON:~/Development/lc0$ build/release/lc0 -w BT4-1024x15x32h-swa-6147500.pb.gz --verbose-move-stats
       _
|   _ | |
|_ |_ |_| v0.31.0-rc1 built Mar 27 2024
position fen r1bqk2r/1pppbppp/p1n2n2/1B2p3/4P3/5N2/PPPP1PPP/RNBQR1K1 w kq - 0 6 moves b5c6 d7c6 f3e5 e8h8 d2d3 c6c5
go nodes 400000
Loading weights file from: BT4-1024x15x32h-swa-6147500.pb.gz
Weights file has multihead format, updating format flag
Creating backend [cuda-auto]...
Switching to [cuda-fp16]...
CUDA Runtime version: 12.4.0
Latest version of CUDA supported by the driver: 12.4.0
GPU: NVIDIA GeForce RTX 3080
GPU memory: 9.99951 Gb
GPU clock frequency: 1755 MHz
GPU compute capability: 8.6
L2 cache capacity: 5242880
info depth 1 seldepth 2 time 2697 nodes 2 score cp 211 nps 105 tbhits 0 pv b1c3 f6d7
info depth 2 seldepth 3 time 2718 nodes 6 score cp 227 nps 150 tbhits 0 pv b1c3 c8e6 f2f4
info depth 2 seldepth 4 time 2734 nodes 11 score cp 225 nps 196 tbhits 0 pv b1c3 f8e8 a2a4 e7f8
info depth 3 seldepth 4 time 2740 nodes 16 score cp 220 nps 258 tbhits 0 pv b1c3 f6d7 e5c4 b7b5
info depth 3 seldepth 5 time 2746 nodes 17 score cp 221 nps 250 tbhits 0 pv b1c3 f6d7 e5c4 b7b5
info depth 3 seldepth 6 time 2757 nodes 29 score cp 219 nps 367 tbhits 0 pv b1c3 f6d7 e5c4 b7b5 c4a5
info depth 4 seldepth 7 time 2771 nodes 50 score cp 205 nps 537 tbhits 0 pv b1c3 f6d7 e5c4 b7b5 c4e3 f8e8
info depth 4 seldepth 8 time 2794 nodes 111 score cp 208 nps 956 tbhits 0 pv b1c3 f6d7 e5g4 d7b8 c1f4 b8c6
info depth 5 seldepth 9 time 2823 nodes 202 score cp 219 nps 1393 tbhits 0 pv b1c3 f6d7 e5g4 d7b8 c1f4 b8c6 g4e3
info depth 5 seldepth 10 time 2906 nodes 512 score cp 214 nps 2245 tbhits 0 pv b1c3 f6d7 e5g4 d7b8 c1f4 b8c6 g4e3 c8e6
info depth 5 seldepth 11 time 2960 nodes 711 score cp 209 nps 2521 tbhits 0 pv b1c3 f8e8 a2a4 e7f8 e5c4 c8g4 f2f3 g4e6 c1g5 h7h6 g5h4
info depth 6 seldepth 11 time 2996 nodes 866 score cp 209 nps 2723 tbhits 0 pv b1c3 f8e8 a2a4 e7f8 e5c4 c8g4 f2f3 g4e6 c4e3 f6d7 f3f4
info depth 6 seldepth 12 time 3207 nodes 1735 score cp 207 nps 3279 tbhits 0 pv b1c3 f8e8 a2a4 c8e6 f2f4 f6d7 e5f3 f7f6 d1e2 d7b8 f4f5
info depth 6 seldepth 13 time 3313 nodes 2169 score cp 200 nps 3415 tbhits 0 pv b1c3 f8e8 a2a4 e7d6 e5c4 f6g4 h2h3 d8h4 d1f3 d6h2 g1f1
info depth 7 seldepth 14 time 3383 nodes 2421 score cp 196 nps 3434 tbhits 0 pv b1c3 f8e8 a2a4 e7d6 e5c4 f6g4 c4d6 d8d6 g2g3 f7f6 g1g2 g4e5 f2f4
info depth 7 seldepth 15 time 3562 nodes 3173 score cp 195 nps 3593 tbhits 0 pv b1c3 f8e8 a2a4 e7d6 e5c4 f6g4 c4d6 d8d6 g2g3 g4e5 c1f4 f7f6 c3d5 c8e6 d5e3
info depth 7 seldepth 16 time 3634 nodes 3472 score cp 196 nps 3631 tbhits 0 pv b1c3 f8e8 a2a4 e7d6 e5c4 f6g4 c4d6 d8d6 g2g3 g4e5 c1f4 f7f6 c3d5 c8e6 d5e3 d6d4
info depth 7 seldepth 17 time 3855 nodes 4394 score cp 197 nps 3736 tbhits 0 pv b1c3 f8e8 d1f3 c8e6 c1f4 f6d7 e5d7 d8d7 f3g3 a8c8 h2h3 b7b5 b2b3
info depth 8 seldepth 17 time 4350 nodes 6706 score cp 199 nps 4010 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 d7b6 d1f3 b5b4 c3e2
info depth 8 seldepth 18 time 4436 nodes 7160 score cp 200 nps 4072 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 d7b6 d1f3 b5b4 c3e2
info depth 8 seldepth 19 time 4653 nodes 8135 score cp 200 nps 4118 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 e7f6 c3d5 e6d5 e3d5 f6b2
info depth 8 seldepth 20 time 4870 nodes 9071 score cp 199 nps 4138 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 d1f3 e7d6 e5c4 d6f4 f3f4 e6c4 d3c4 d8d4 a1d1 d4c4 f4c7 c4b4 e4e5 b4b2 c3a4 b2c2
info depth 8 seldepth 21 time 6774 nodes 17598 score cp 201 nps 4296 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 e7f6 f2f4 b5b4 c3a4 d7b6 a4b6 c7b6 e4e5
info depth 9 seldepth 21 time 8296 nodes 24971 score cp 200 nps 4445 tbhits 0 pv b1c3 f6d7 e5d7 c8d7 c1f4 d7c6 d1g4 f8e8 e1e3 b7b5 a1e1 e7f8 h2h4 b5b4 c3e2
info depth 9 seldepth 22 time 9601 nodes 30588 score cp 197 nps 4418 tbhits 0 pv b1c3 f6d7 e5d7 c8d7 c1f4 d7c6 d1g4 f8e8 e1e3 b7b5 a1e1 e7f8 h2h4 b5b4 c3e2
info depth 9 seldepth 23 time 9746 nodes 31224 score cp 198 nps 4417 tbhits 0 pv b1c3 f6d7 e5d7 c8d7 c1f4 d7c6 d1g4 f8e8 e1e3 b7b5 a1e1 e7f8 h2h4 b5b4 c3e2
info depth 9 seldepth 24 time 13293 nodes 47844 score cp 198 nps 4507 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 e7f6 f2f4 b5b4 c3a4 d7b6 a4b6 c7b6 e4e5 f6h4 d1f3 h4g3
info depth 9 seldepth 25 time 13657 nodes 49540 score cp 197 nps 4512 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 e7f6 f2f4 b5b4 c3a4 d7b6 a4b6 c7b6 e4e5 f6h4 c2c3 h4g3 h2g3
info depth 10 seldepth 25 time 14565 nodes 53859 score cp 197 nps 4530 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 e7f6 f2f4 b5b4 c3a4 d7b6 a4b6 c7b6 e4e5 f6e7 f4f5 e6c8 d1f3 a8a7 e1e2 d8d4
info depth 10 seldepth 26 time 15473 nodes 58000 score cp 197 nps 4533 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 e7f6 f2f4 b5b4 c3a4 d7b6 a4b6 c7b6 e4e5 f6e7 f4f5 e6c8 d1f3 a8a7 e1e2 d8d4
info depth 10 seldepth 27 time 16439 nodes 62282 score cp 197 nps 4526 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 e7f6 f2f4 f6d4 g3f2 f7f6 d1f3 d7b6 e3d1 c5c4 c3e2
info depth 10 seldepth 28 time 17203 nodes 65748 score cp 198 nps 4526 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 a2a4 e7d6 d1f3 e6b3 e5f7 b3f7 e4e5 d6f8 e5f6 d8f6 f3g3 c5c4 f4c7 f6g6 c3e4 a8c8 c7a5 g6g3 h2g3
info depth 10 seldepth 29 time 21998 nodes 94024 score cp 198 nps 4866 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 e7f6 f2f4 f6d4 g3f2 d7b6 f4f5 e6d7 d1h5 b5b4
info depth 10 seldepth 30 time 22509 nodes 96481 score cp 198 nps 4865 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 f4g3 f6d7 e5c4 b7b5 c4e3 e7f6 f2f4 f6d4 g3f2 d7b6 f4f5 e6d7 d1h5 b5b4 c3e2
info depth 10 seldepth 30 time 27524 nodes 128298 score cp -2147483648 nps 5163 tbhits 0 pv b1c3 f8e8 c1f4 c8e6 e5f3 f6d7 h2h3 d7f8 f4g3 f7f6 d3d4 c5d4 f3d4 e7b4 e1e3 c7c6
info depth 10 seldepth 30 time 31028 nodes 150057 score cp 196 nps 5293 tbhits 0 pv c1f4 e7d6 d1f3 c8e6 b1d2 f8e8 d2c4 b7b5 c4e3 d8c8 h2h3 e6h3 e5f7 g8f7 g2h3 d6f4 f3f4 c8h3 e4e5 e8g8 e3g2
info depth 10 seldepth 30 time 36029 nodes 178088 score cp 196 nps 5339 tbhits 0 pv c1f4 e7d6 d1f3 c8e6 b1d2 f8e8 d2c4 b7b5 c4e3 d8c8 h2h3 e6h3 e5f7 g8f7 g2h3 d6f4 f3f4 c8h3 e4e5 e8g8 e3g2
info depth 11 seldepth 30 time 36066 nodes 178314 score cp 196 nps 5340 tbhits 0 pv c1f4 e7d6 d1f3 c8e6 b1d2 f8e8 d2c4 b7b5 c4e3 d8c8 h2h3 e6h3 e5f7 g8f7 g2h3 d6f4 f3f4 c8h3 e4e5 e8g8 e3g2
info depth 11 seldepth 31 time 37089 nodes 183496 score cp 196 nps 5332 tbhits 0 pv c1f4 e7d6 d1f3 c8e6 b1d2 f8e8 d2c4 b7b5 c4e3 d8c8 h2h3 e6h3 e5f7 g8f7 g2h3 d6f4 f3f4 c8h3 e4e5 e8g8 e3g2
info depth 11 seldepth 31 time 42119 nodes 211555 score cp 196 nps 5363 tbhits 0 pv c1f4 e7d6 d1f3 c8e6 b1d2 f8e8 d2c4 b7b5 c4e3 d8c8 h2h3 e6h3 e5f7 g8f7 g2h3 d6f4 f3f4 c8h3 e4e5 e8g8 e3g2
info depth 11 seldepth 31 time 47154 nodes 247273 score cp -2147483648 nps 5559 tbhits 0 pv c1f4 f8e8 b1c3 c8e6 a2a4 e7d6 d1f3 e6b3 e5f7 b3f7 e4e5 d6f8 e5f6 d8f6 f3g3 c5c4 f4c7 f6g6 a4a5 a8c8 e1e8 c8e8
info depth 11 seldepth 31 time 52164 nodes 276696 score cp -2147483648 nps 5591 tbhits 0 pv c1f4 f8e8 b1c3 c8e6 a2a4 e7d6 d1f3 e6b3 e5f7 b3f7 e4e5 d6f8 e5f6 d8f6 f3g3 c5c4 f4c7 f6g6 a4a5 a8c8 e1e8 c8e8
Segmentation fault
(venv) george@FALCON:~/Development/lc0$ 
Videodr0me commented 5 months ago

Very brief testing suggests this is fixed in RC2. Can others corroborate - then I will close....