LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.45k stars 531 forks source link

Blundered win into perpetual check forcing stalemate (cause: resign and temperature?) #710

Closed Mardak closed 4 years ago

Mardak commented 5 years ago

Here's a position from CCC 4 game 450 where a few moves earlier, 32742 says +28 and SF says +153. screen shot 2019-01-29 at 10 08 29 pm

This position 32742 captures a rook still unaware that black will perpetually check with the lone rook as white capturing will result in stalemate.

If self-play got into this position, here's what it would look close to (but without noise):

position startpos moves e2e4 c7c6 d2d4 d7d5 e4e5 c8f5 f1e2 e7e6 g1f3 f8b4 c2c3 b4e7 c1e3 b8d7 e1g1 f5g6 b1d2 g8h6 e3h6 g7h6 d2b3 f7f6 e5f6 e7d6 d1d2 d8f6 d2h6 e8c8 a1e1 h8g8 e2d1 d8f8 g2g3 g6f5 h6f6 d7f6 f3h4 f5h3 h4g2 h7h5 e1e3 h5h4 g2h4 h3f1 g1f1 c8d7 f1g2 f6g4 d1g4 g8g4 h4f3 b7b6 h2h3 g4g7 b3c1 d6f4 e3e1 f4c1 e1c1 c6c5 g3g4 d7d6 c1c2 f8f4 g2g3 f4e4 c2d2 d6e7 f3e5 e4e1 h3h4 a7a5 h4h5 e1g1 g3f4 g7g8 f2f3 e7f6 d2h2 f6g7 e5g6 g7f6 g6e5 f6g7 e5g6 g1f1 h2e2 g8e8 g6e5 e8f8 f4g5 f1g1 e5g6 f8f5 g5h4 g1h1 h4g3 f5f6 g6h4 c5d4 c3d4 h1g1 h4g2 f6h6 g3h2 g1d1 g2f4 h6f6 f4e6 g7f7 e6g5 f7g8 h2g3 d1g1 g3h2 g1a1 a2a3 a1d1 h2g3 d1g1 g3h3 b6b5 e2e8 f6f8 e8e6 g1d1 h5h6 b5b4 a3b4 a5b4 h3h4 d1h1 h4g3 b4b3 e6b6 h1g1 g3h2 g1a1 b6b3 a1a6 h6h7 g8h8 h2g3 a6a8 f3f4 a8b8 b3c3 b8b2 f4f5 b2b7 g3f4 b7e7 c3c6 e7e2 c6d6 e2f2 f4g3 f2f1 d6d5 f1g1 g3f4 g1f1 f4e3 f8e8 d5e5 e8f8 e5e6 f1d1 e6d6 f8e8 e3f3 d1f1 f3g2 f1f4 g2g3 f4f1 d6e6 e8b8 e6g6 b8b3 g3g2 b3b2 g2f1
go nodes 800

weights: 32742

info string b2c2  (1421) N:      38 (+ 0) (P:  5.93%) (Q: -0.99063) (U: 0.13219) (Q+U: -0.85844) (V: -0.9865) 
info string b2h2  (1426) N:      39 (+ 0) (P:  6.15%) (Q: -0.99313) (U: 0.13356) (Q+U: -0.85958) (V: -0.9865) 
info string b2a2  (1420) N:      43 (+ 0) (P:  6.45%) (Q: -0.98678) (U: 0.12744) (Q+U: -0.85934) (V: -0.9843) 
info string b2b3  (1417) N:      45 (+ 0) (P:  7.02%) (Q: -0.99160) (U: 0.13263) (Q+U: -0.85897) (V: -0.9856) 
info string b2b7  (1406) N:      48 (+ 0) (P:  7.16%) (Q: -0.98724) (U: 0.12700) (Q+U: -0.86024) (V: -0.9788) 
info string b2d2  (1422) N:      48 (+ 0) (P:  6.34%) (Q: -0.97253) (U: 0.11249) (Q+U: -0.86003) (V: -0.9881) 
info string b2b5  (1410) N:      49 (+ 0) (P:  7.35%) (Q: -0.98744) (U: 0.12780) (Q+U: -0.85964) (V: -0.9841) 
info string b2b4  (1413) N:      50 (+ 0) (P:  7.53%) (Q: -0.98916) (U: 0.12831) (Q+U: -0.86085) (V: -0.9894) 
info string b2g2  (1425) N:      57 (+ 0) (P:  5.73%) (Q: -0.94441) (U: 0.08589) (Q+U: -0.85852) (V: -0.9931) 
info string b2b1  (1428) N:      57 (+ 0) (P:  8.41%) (Q: -0.98449) (U: 0.12600) (Q+U: -0.85850) (V: -0.9835) 
info string b2b6  (1408) N:      60 (+ 0) (P:  7.28%) (Q: -0.96194) (U: 0.10375) (Q+U: -0.85819) (V: -0.9823) 
info string b2e2  (1423) N:      71 (+ 0) (P:  5.50%) (Q: -0.92483) (U: 0.06639) (Q+U: -0.85844) (V: -0.9900) 
info string b2b8  (1404) N:      94 (+ 0) (P: 12.66%) (Q: -0.97493) (U: 0.11588) (Q+U: -0.85905) (V: -0.9739) 
info string b2f2  (1424) N:     100 (+ 0) (P:  6.48%) (Q: -0.91420) (U: 0.05578) (Q+U: -0.85842) (V: -0.9892) 

It seems likely black would have resigned before the game got here, but even if black were to search here, the perpetual check move is only barely the top move, so temperature even with endgame at 0.45 will play the wrong move ~80% of the time. Continuing that across 50 moves is definitely not happening. Here's the probability of picking the 100-visit (and other visit moves for various temperatures:

T=0.1 63.0% 33.9%  2.0%  0.4%  0.2%  0.2%  0.1%  0.1%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
T=0.2 44.0% 32.3%  7.9%  3.4%  2.6%  2.6%  1.4%  1.2%  1.1%  1.1%  0.8%  0.6%  0.4%  0.3%
T=0.3 31.2% 25.4% 10.0%  5.7%  4.8%  4.8%  3.1%  2.9%  2.7%  2.7%  2.2%  1.9%  1.4%  1.2%
T=0.4 24.1% 20.6% 10.2%  6.7%  5.9%  5.9%  4.3%  4.0%  3.8%  3.8%  3.3%  2.9%  2.3%  2.1%
T=0.5 19.9% 17.6% 10.0%  7.2%  6.5%  6.5%  5.0%  4.8%  4.6%  4.6%  4.0%  3.7%  3.0%  2.9%
T=0.6 17.2% 15.5%  9.7%  7.4%  6.8%  6.8%  5.4%  5.2%  5.1%  5.1%  4.6%  4.2%  3.6%  3.4%
T=0.7 15.5% 14.1%  9.5%  7.4%  6.9%  6.9%  5.7%  5.6%  5.4%  5.4%  4.9%  4.6%  4.0%  3.9%
T=0.8 14.2% 13.1%  9.2%  7.5%  7.0%  7.0%  6.0%  5.8%  5.7%  5.7%  5.2%  4.9%  4.4%  4.2%
T=0.9 13.2% 12.4%  9.1%  7.5%  7.1%  7.1%  6.1%  6.0%  5.9%  5.9%  5.5%  5.2%  4.7%  4.5%
T=1.0 12.5% 11.8%  8.9%  7.5%  7.1%  7.1%  6.3%  6.1%  6.0%  6.0%  5.6%  5.4%  4.9%  4.8%

To contrast, @dkappe's ender90l would generate reasonable training data and actually be significantly more likely to play the correct move:

weights: ender90l

info string b2d2  (1422) N:       0 (+ 0) (P:  0.55%) (Q: -1.30518) (U: 0.47630) (Q+U: -0.82888) (V:  -.----) 
info string b2b4  (1413) N:       0 (+ 0) (P:  0.74%) (Q: -1.30518) (U: 0.63927) (Q+U: -0.66591) (V:  -.----) 
info string b2h2  (1426) N:       0 (+ 0) (P:  0.87%) (Q: -1.30518) (U: 0.75234) (Q+U: -0.55284) (V:  -.----) 
info string b2c2  (1421) N:       0 (+ 0) (P:  0.89%) (Q: -1.30518) (U: 0.77290) (Q+U: -0.53229) (V:  -.----) 
info string b2e2  (1423) N:       0 (+ 0) (P:  0.89%) (Q: -1.30518) (U: 0.77389) (Q+U: -0.53129) (V:  -.----) 
info string b2g2  (1425) N:       0 (+ 0) (P:  1.07%) (Q: -1.30518) (U: 0.93072) (Q+U: -0.37446) (V:  -.----) 
info string b2b5  (1410) N:       0 (+ 0) (P:  1.13%) (Q: -1.30518) (U: 0.98311) (Q+U: -0.32207) (V:  -.----) 
info string b2b8  (1404) N:       1 (+ 0) (P:  1.56%) (Q: -0.99972) (U: 0.67840) (Q+U: -0.32132) (V: -0.9997) 
info string b2b6  (1408) N:       1 (+ 0) (P:  1.64%) (Q: -0.99883) (U: 0.71222) (Q+U: -0.28661) (V: -0.9988) 
info string b2b7  (1406) N:       1 (+ 0) (P:  1.69%) (Q: -0.99980) (U: 0.73311) (Q+U: -0.26669) (V: -0.9998) 
info string b2b3  (1417) N:       1 (+ 0) (P:  1.77%) (Q: -0.99962) (U: 0.76858) (Q+U: -0.23104) (V: -0.9996) 
info string b2a2  (1420) N:       1 (+ 0) (P:  1.77%) (Q: -0.99968) (U: 0.77124) (Q+U: -0.22845) (V: -0.9997) 
info string b2b1  (1428) N:     161 (+ 0) (P: 54.83%) (Q: -0.33874) (U: 0.29421) (Q+U: -0.04454) (V: -0.8822) 
info string b2f2  (1424) N:     633 (+ 1) (P: 30.62%) (Q: -0.08594) (U: 0.04191) (Q+U: -0.04403) (V: -0.0250) 

Notably the raw network eval for b2f2 capture is -0.03 from ender90l while 32742 says -0.99. T40 doesn't look great either with 40500 saying -0.98:

weights: 40500

info string b2b3  (1417) N:      42 (+ 0) (P:  5.93%) (Q: -0.99137) (U: 0.11989) (Q+U: -0.87148) (V: -0.9909) 
info string b2a2  (1420) N:      43 (+ 0) (P:  6.02%) (Q: -0.99230) (U: 0.11888) (Q+U: -0.87341) (V: -0.9925) 
info string b2d2  (1422) N:      43 (+ 0) (P:  6.12%) (Q: -0.99294) (U: 0.12096) (Q+U: -0.87197) (V: -0.9912) 
info string b2b4  (1413) N:      43 (+ 0) (P:  6.00%) (Q: -0.98998) (U: 0.11858) (Q+U: -0.87140) (V: -0.9888) 
info string b2g2  (1425) N:      48 (+ 0) (P:  6.69%) (Q: -0.99079) (U: 0.11866) (Q+U: -0.87213) (V: -0.9884) 
info string b2b5  (1410) N:      48 (+ 0) (P:  6.67%) (Q: -0.99004) (U: 0.11839) (Q+U: -0.87165) (V: -0.9910) 
info string b2b6  (1408) N:      49 (+ 0) (P:  7.08%) (Q: -0.99427) (U: 0.12303) (Q+U: -0.87125) (V: -0.9854) 
info string b2h2  (1426) N:      53 (+ 0) (P:  7.55%) (Q: -0.99410) (U: 0.12158) (Q+U: -0.87253) (V: -0.9931) 
info string b2c2  (1421) N:      54 (+ 0) (P:  7.40%) (Q: -0.99016) (U: 0.11700) (Q+U: -0.87315) (V: -0.9929) 
info string b2e2  (1423) N:      55 (+ 0) (P:  7.39%) (Q: -0.98575) (U: 0.11468) (Q+U: -0.87107) (V: -0.9892) 
info string b2b7  (1406) N:      56 (+ 0) (P:  7.56%) (Q: -0.98838) (U: 0.11527) (Q+U: -0.87311) (V: -0.9897) 
info string b2b8  (1404) N:      65 (+ 0) (P:  8.45%) (Q: -0.98261) (U: 0.11129) (Q+U: -0.87132) (V: -0.9882) 
info string b2b1  (1428) N:      67 (+ 0) (P:  9.38%) (Q: -0.99172) (U: 0.11995) (Q+U: -0.87176) (V: -0.9868) 
info string b2f2  (1424) N:     133 (+ 0) (P:  7.75%) (Q: -0.92145) (U: 0.05024) (Q+U: -0.87121) (V: -0.9768) 
Mardak commented 5 years ago

For reference, additional visits with 32742:

1600 visits

info string b2c2  (1421) N:      58 (+ 0) (P:  5.93%) (Q: -0.99088) (U: 0.12656) (Q+U: -0.86432) (V: -0.9865) 
info string b2h2  (1426) N:      60 (+ 0) (P:  6.15%) (Q: -0.99191) (U: 0.12685) (Q+U: -0.86506) (V: -0.9865) 
info string b2a2  (1420) N:      65 (+ 0) (P:  6.45%) (Q: -0.98781) (U: 0.12306) (Q+U: -0.86475) (V: -0.9843) 
info string b2d2  (1422) N:      69 (+ 0) (P:  6.34%) (Q: -0.97900) (U: 0.11405) (Q+U: -0.86495) (V: -0.9881) 
info string b2g2  (1425) N:      70 (+ 0) (P:  5.73%) (Q: -0.96666) (U: 0.10163) (Q+U: -0.86504) (V: -0.9931) 
info string b2b3  (1417) N:      71 (+ 0) (P:  7.02%) (Q: -0.98816) (U: 0.12273) (Q+U: -0.86543) (V: -0.9856) 
info string b2b7  (1406) N:      73 (+ 0) (P:  7.16%) (Q: -0.98669) (U: 0.12180) (Q+U: -0.86488) (V: -0.9788) 
info string b2b5  (1410) N:      74 (+ 0) (P:  7.35%) (Q: -0.98790) (U: 0.12341) (Q+U: -0.86450) (V: -0.9841) 
info string b2b4  (1413) N:      76 (+ 0) (P:  7.53%) (Q: -0.98834) (U: 0.12310) (Q+U: -0.86525) (V: -0.9894) 
info string b2b6  (1408) N:      84 (+ 0) (P:  7.28%) (Q: -0.97330) (U: 0.10785) (Q+U: -0.86545) (V: -0.9823) 
info string b2b1  (1428) N:      86 (+ 0) (P:  8.41%) (Q: -0.98656) (U: 0.12167) (Q+U: -0.86490) (V: -0.9835) 
info string b2e2  (1423) N:      92 (+ 0) (P:  5.50%) (Q: -0.93983) (U: 0.07445) (Q+U: -0.86538) (V: -0.9900) 
info string b2b8  (1404) N:     158 (+ 1) (P: 12.66%) (Q: -0.96430) (U: 0.09965) (Q+U: -0.86465) (V: -0.9739) 
info string b2f2  (1424) N:     563 (+ 0) (P:  6.48%) (Q: -0.87882) (U: 0.01447) (Q+U: -0.86435) (V: -0.9892) 

T=0.1 100.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
T=0.2 99.8%  0.2%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
T=0.3 97.2%  1.4%  0.2%  0.2%  0.2%  0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.1%  0.0%
T=0.4 89.5%  3.7%  1.0%  0.8%  0.8%  0.6%  0.6%  0.5%  0.5%  0.5%  0.5%  0.4%  0.3%  0.3%
T=0.5 77.8%  6.1%  2.1%  1.8%  1.7%  1.4%  1.3%  1.3%  1.2%  1.2%  1.2%  1.0%  0.9%  0.8%
T=0.6 65.6%  7.9%  3.2%  2.9%  2.8%  2.3%  2.2%  2.2%  2.1%  2.0%  2.0%  1.8%  1.6%  1.5%
T=0.7 55.0%  9.0%  4.1%  3.8%  3.6%  3.1%  3.0%  3.0%  2.9%  2.8%  2.7%  2.5%  2.2%  2.1%
T=0.8 46.7%  9.5%  4.8%  4.5%  4.3%  3.8%  3.7%  3.6%  3.5%  3.4%  3.4%  3.1%  2.8%  2.7%
T=0.9 40.2%  9.8%  5.4%  5.0%  4.9%  4.3%  4.2%  4.2%  4.0%  4.0%  3.9%  3.7%  3.3%  3.2%
T=1.0 35.2%  9.9%  5.8%  5.4%  5.3%  4.8%  4.6%  4.6%  4.4%  4.4%  4.3%  4.1%  3.8%  3.6%
2400 visits

info string b2c2  (1421) N:      78 (+ 0) (P:  5.93%) (Q: -0.98943) (U: 0.11842) (Q+U: -0.87102) (V: -0.9865) 
info string b2a2  (1420) N:      86 (+ 0) (P:  6.45%) (Q: -0.98786) (U: 0.11696) (Q+U: -0.87090) (V: -0.9843) 
info string b2h2  (1426) N:      87 (+ 0) (P:  6.15%) (Q: -0.98049) (U: 0.11016) (Q+U: -0.87032) (V: -0.9865) 
info string b2b7  (1406) N:      96 (+ 0) (P:  7.16%) (Q: -0.98740) (U: 0.11642) (Q+U: -0.87098) (V: -0.9788) 
info string b2b3  (1417) N:      97 (+ 0) (P:  7.02%) (Q: -0.98298) (U: 0.11297) (Q+U: -0.87001) (V: -0.9856) 
info string b2b5  (1410) N:      98 (+ 0) (P:  7.35%) (Q: -0.98763) (U: 0.11713) (Q+U: -0.87050) (V: -0.9841) 
info string b2d2  (1422) N:      98 (+ 0) (P:  6.34%) (Q: -0.97121) (U: 0.10103) (Q+U: -0.87018) (V: -0.9881) 
info string b2b4  (1413) N:      99 (+ 1) (P:  7.53%) (Q: -0.98853) (U: 0.11757) (Q+U: -0.87096) (V: -0.9894) 
info string b2g2  (1425) N:     104 (+ 0) (P:  5.73%) (Q: -0.95608) (U: 0.08609) (Q+U: -0.86998) (V: -0.9931) 
info string b2b6  (1408) N:     106 (+ 0) (P:  7.28%) (Q: -0.97734) (U: 0.10733) (Q+U: -0.87000) (V: -0.9823) 
info string b2b1  (1428) N:     112 (+ 0) (P:  8.41%) (Q: -0.98798) (U: 0.11735) (Q+U: -0.87063) (V: -0.9835) 
info string b2e2  (1423) N:     118 (+ 0) (P:  5.50%) (Q: -0.94329) (U: 0.07289) (Q+U: -0.87040) (V: -0.9900) 
info string b2b8  (1404) N:     201 (+ 0) (P: 12.66%) (Q: -0.96884) (U: 0.09889) (Q+U: -0.86995) (V: -0.9739) 
info string b2f2  (1424) N:    1019 (+ 0) (P:  6.48%) (Q: -0.87988) (U: 0.01002) (Q+U: -0.86986) (V: -0.9892) 

T=0.1 100.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
T=0.2 100.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
T=0.3 99.0%  0.4%  0.1%  0.1%  0.1%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
T=0.4 95.0%  1.6%  0.4%  0.4%  0.3%  0.3%  0.3%  0.3%  0.3%  0.3%  0.3%  0.2%  0.2%  0.2%
T=0.5 86.8%  3.4%  1.2%  1.0%  0.9%  0.9%  0.8%  0.8%  0.8%  0.8%  0.8%  0.6%  0.6%  0.5%
T=0.6 76.2%  5.1%  2.1%  1.9%  1.8%  1.7%  1.6%  1.5%  1.5%  1.5%  1.5%  1.3%  1.2%  1.1%
T=0.7 65.6%  6.5%  3.0%  2.8%  2.6%  2.5%  2.3%  2.3%  2.3%  2.3%  2.2%  2.0%  1.9%  1.7%
T=0.8 56.3%  7.4%  3.8%  3.6%  3.3%  3.2%  3.1%  3.0%  3.0%  3.0%  2.9%  2.6%  2.6%  2.3%
T=0.9 48.6%  8.0%  4.4%  4.2%  3.9%  3.8%  3.6%  3.6%  3.6%  3.6%  3.5%  3.2%  3.1%  2.8%
T=1.0 42.5%  8.4%  4.9%  4.7%  4.4%  4.3%  4.1%  4.1%  4.1%  4.0%  4.0%  3.6%  3.6%  3.3%
3200 visits

info string b2c2  (1421) N:     103 (+ 0) (P:  5.93%) (Q: -0.98825) (U: 0.10614) (Q+U: -0.88212) (V: -0.9865) 
info string b2h2  (1426) N:     112 (+ 0) (P:  6.15%) (Q: -0.98373) (U: 0.10123) (Q+U: -0.88250) (V: -0.9865) 
info string b2a2  (1420) N:     114 (+ 0) (P:  6.45%) (Q: -0.98664) (U: 0.10441) (Q+U: -0.88223) (V: -0.9843) 
info string b2d2  (1422) N:     124 (+ 0) (P:  6.34%) (Q: -0.97641) (U: 0.09442) (Q+U: -0.88199) (V: -0.9881) 
info string b2b7  (1406) N:     125 (+ 0) (P:  7.16%) (Q: -0.98784) (U: 0.10575) (Q+U: -0.88209) (V: -0.9788) 
info string b2b3  (1417) N:     126 (+ 0) (P:  7.02%) (Q: -0.98519) (U: 0.10286) (Q+U: -0.88233) (V: -0.9856) 
info string b2g2  (1425) N:     129 (+ 0) (P:  5.73%) (Q: -0.96433) (U: 0.08205) (Q+U: -0.88228) (V: -0.9931) 
info string b2b4  (1413) N:     131 (+ 0) (P:  7.53%) (Q: -0.98822) (U: 0.10615) (Q+U: -0.88207) (V: -0.9894) 
info string b2b5  (1410) N:     137 (+ 0) (P:  7.35%) (Q: -0.98091) (U: 0.09915) (Q+U: -0.88176) (V: -0.9841) 
info string b2b6  (1408) N:     138 (+ 0) (P:  7.28%) (Q: -0.97932) (U: 0.09749) (Q+U: -0.88182) (V: -0.9823) 
info string b2e2  (1423) N:     145 (+ 0) (P:  5.50%) (Q: -0.95239) (U: 0.07010) (Q+U: -0.88229) (V: -0.9900) 
info string b2b1  (1428) N:     145 (+ 0) (P:  8.41%) (Q: -0.98926) (U: 0.10717) (Q+U: -0.88209) (V: -0.9835) 
info string b2b8  (1404) N:     266 (+ 0) (P: 12.66%) (Q: -0.97000) (U: 0.08828) (Q+U: -0.88172) (V: -0.9739) 
info string b2f2  (1424) N:    1404 (+ 1) (P:  6.48%) (Q: -0.88989) (U: 0.00858) (Q+U: -0.88131) (V: -0.9892) 

T=0.1 100.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
T=0.2 100.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
T=0.3 99.2%  0.4%  0.1%  0.1%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%
T=0.4 95.6%  1.5%  0.3%  0.3%  0.3%  0.3%  0.3%  0.2%  0.2%  0.2%  0.2%  0.2%  0.2%  0.1%
T=0.5 88.1%  3.2%  0.9%  0.9%  0.9%  0.8%  0.8%  0.7%  0.7%  0.7%  0.7%  0.6%  0.6%  0.5%
T=0.6 77.9%  4.9%  1.8%  1.8%  1.6%  1.6%  1.5%  1.5%  1.4%  1.4%  1.4%  1.2%  1.2%  1.0%
T=0.7 67.4%  6.3%  2.6%  2.6%  2.5%  2.4%  2.3%  2.2%  2.2%  2.1%  2.1%  1.9%  1.8%  1.6%
T=0.8 58.0%  7.3%  3.4%  3.4%  3.2%  3.2%  3.0%  2.9%  2.8%  2.8%  2.8%  2.5%  2.5%  2.2%
T=0.9 50.2%  7.9%  4.0%  4.0%  3.8%  3.8%  3.6%  3.5%  3.4%  3.4%  3.4%  3.1%  3.0%  2.8%
T=1.0 43.9%  8.3%  4.5%  4.5%  4.3%  4.3%  4.1%  4.0%  3.9%  3.9%  3.9%  3.6%  3.5%  3.2%

So the training policy for b2f2 with 800, 1600, 2400 and 3200 visits should move towards 12.5%, 35.2%, 42.5% and 43.9% respectively in this case.

Videodr0me commented 5 years ago

PR700 (NN32742) with CP and 2-fold-draw solves this in 6172 nodes with a proven draw score. Normal leela still thinks its winning after 200000 nodes.

PR487 with one-ply-lookahead halfs this to 2930 nodes before draw is proven. Also PR487 avoids 90. Rg6 after 3539 nodes. PR700 avoids Rg6 at 4000 nodes. Both settle first on Re3 (also winning) and then on d5 - SF choice (also winning). All results with 6p TB.

Videodr0me commented 5 years ago

I think the real challenge for leela is to find 90. ... Rb3+!. With colors reversed she would not have found this saving resource. This takes a long time as giving up that rook lowers Q so much that this branch does not get any more visits. Even with CP and two-fold-draw scoring leela would have missed that if colors would have been reversed.

Ttl commented 5 years ago

CCRL dataset trained 20x256 weights evaluate this much closer to draw:

info string b2g2  (1425) N:       0 (+ 0) (P:  1.12%) (Q: -1.41662) (U: 0.91517) (Q+U: -0.50145) (V:  -.----) 
info string b2b4  (1413) N:       0 (+ 0) (P:  1.23%) (Q: -1.41662) (U: 1.00596) (Q+U: -0.41066) (V:  -.----) 
info string b2e2  (1423) N:       0 (+ 0) (P:  1.34%) (Q: -1.41662) (U: 1.10052) (Q+U: -0.31610) (V:  -.----) 
info string b2b5  (1410) N:       0 (+ 0) (P:  1.38%) (Q: -1.41662) (U: 1.13558) (Q+U: -0.28104) (V:  -.----) 
info string b2b6  (1408) N:       1 (+ 0) (P:  1.66%) (Q: -0.99708) (U: 0.68191) (Q+U: -0.31516) (V: -0.9953) 
info string b2d2  (1422) N:       1 (+ 0) (P:  1.66%) (Q: -0.99670) (U: 0.68160) (Q+U: -0.31510) (V: -0.9947) 
info string b2b3  (1417) N:       1 (+ 0) (P:  1.77%) (Q: -0.99700) (U: 0.72543) (Q+U: -0.27157) (V: -0.9952) 
info string b2c2  (1421) N:       1 (+ 0) (P:  1.96%) (Q: -0.99669) (U: 0.80621) (Q+U: -0.19048) (V: -0.9947) 
info string b2a2  (1420) N:       2 (+ 0) (P:  2.32%) (Q: -0.99718) (U: 0.63391) (Q+U: -0.36328) (V: -0.9927) 
info string b2h2  (1426) N:       2 (+ 0) (P:  2.78%) (Q: -0.99803) (U: 0.75935) (Q+U: -0.23867) (V: -0.9949) 
info string b2b7  (1406) N:       2 (+ 0) (P:  3.07%) (Q: -0.99862) (U: 0.83929) (Q+U: -0.15932) (V: -0.9964) 
info string b2b8  (1404) N:      12 (+ 0) (P: 12.63%) (Q: -0.92442) (U: 0.79728) (Q+U: -0.12714) (V: -0.9958) 
info string b2b1  (1428) N:      63 (+ 0) (P: 49.15%) (Q: -0.74494) (U: 0.63025) (Q+U: -0.11469) (V: -0.9818) 
info string b2f2  (1424) N:     631 (+58) (P: 17.94%) (Q: -0.17453) (U: 0.02134) (Q+U: -0.15319) (V: -0.9172) 

The same weights evaluate 90. Rb3+ as the top move at 800 nodes with Q = -0.57. 32742 evaluates the same move at 800 nodes as Q = -0.89.

So it doesn't seem that this is something that the net can't learn. Just needs to have relevant positions played correctly in the training data.

oscardssmith commented 5 years ago

tactics like this will probably never be learned correct at 800 nodes. Once test40 is at full strength, we should probably generate a couple million games at 3200 or so nodes to be a new training set.

Ttl commented 5 years ago

I did empirical test on how much the temperature messes up the endgame by using this games pgn as opening and starting from ply 186 (f2e2 last move), which is clearly a draw. Network is 32742, 800 nodes and self-play settings. I played 100 games with temperatures 0.0, 0.45 and 1.0:

Temp W mates B mates Stalemate 3-fold No material 50 move rule
0.00 0 0 0 100 0 0
0.10 0 0 2 98 0 0
0.20 12 0 6 82 0 0
0.45 60 0 12 28 0 0
1.00 78 1 15 3 1 2

Even 0.45 temperature has a huge effect on the outcome going from 100% draw to 40% draw. With 1.0 temperature black even managed to win once and only 3 games out of 100 ended in 3-fold.

cutechess command line ``` ./cutechess-cli -concurrency 1 -debug -engine name=lc0_ccrl_w cmd=lc0 arg="--weights=32742.pb.gz" arg="--threads=1" arg="--fpu-strategy=absolute" arg="--smart-pruning-factor=0" arg="--minibatch-size=32" arg="--cache-history-length=7" arg="--policy-softmax-temp=1.0" arg="--temperature=0.45" arg="--noise" proto=uci -engine name=lc0_ccrl_b cmd=lc0 arg="--weights=32742.pb.gz" arg="--threads=1" arg="--fpu-strategy=absolute" arg="--smart-pruning-factor=0" arg="--minibatch-size=32" arg="--cache-history-length=7" arg="--policy-softmax-temp=1.0" arg="--temperature=0.45" arg="--noise" proto=uci -games 100 -openings file=Lc0_vs_Stockfish_2019.01.29.pgn plies=186 -each tc=inf nodes=800 -noswap -pgnout lc0_ccc4_450_t30_ply186_t045.pgn > lc0_ccc4_450_t30_ply186_t045.log ```

EDIT: Added 0.1 and 0.2 temperature tests.

Mardak commented 5 years ago

Would switching to T=0 for resign play through make sense? The primary purpose of play through is to determine if the eval was correct, yes?

jjoshua2 commented 5 years ago

That has been argued before, but still needs a PR to enable it, with a flag probably so it can be used in training. Similar another option to have t=0 in half of endgames or something. But we could try t=0 in all games on T30 at the low learning rate and get some high quality SL data without the ability for it to mess anything up.

EDIT: I like the t=0.1 above. It's surprising even t=0.2 has not very accurate winrate.

Mardak commented 5 years ago

Roughly calculating the average trained game outcome with resign and temperature here, the current T=.45 results in 80% black resign + 20% * 60% black blunder so average ~92% white win.

With T=.2, that would be 80% + 20% * 12% = 82% white win for networks to learn that it's sufficiently out of resign territory allowing future self-play to have average 12% white win for a much more reasonable net eval.

So it might not be necessary to drop endgame temperature all the way to 0 or 0.1 or even 0.2 for this position -- just enough to get it out of resign territory. At least currently with T=0.45, the numbers above for 32742 compared to those for 32900 show it's not making much progress:

weights: 32900

info string b2d2  (1422) N:      41 (+ 0) (P:  6.13%) (Q: -0.99262) (U: 0.12694) (Q+U: -0.86567) (V: -0.9867) 
info string b2h2  (1426) N:      42 (+ 0) (P:  6.27%) (Q: -0.99274) (U: 0.12665) (Q+U: -0.86609) (V: -0.9845) 
info string b2a2  (1420) N:      44 (+ 0) (P:  6.39%) (Q: -0.98977) (U: 0.12337) (Q+U: -0.86640) (V: -0.9861) 
info string b2c2  (1421) N:      45 (+ 0) (P:  6.59%) (Q: -0.99126) (U: 0.12456) (Q+U: -0.86670) (V: -0.9881) 
info string b2b6  (1408) N:      46 (+ 0) (P:  6.89%) (Q: -0.99222) (U: 0.12738) (Q+U: -0.86484) (V: -0.9838) 
info string b2b7  (1406) N:      46 (+ 0) (P:  6.74%) (Q: -0.98875) (U: 0.12473) (Q+U: -0.86403) (V: -0.9828) 
info string b2b5  (1410) N:      48 (+ 0) (P:  7.06%) (Q: -0.99001) (U: 0.12516) (Q+U: -0.86485) (V: -0.9815) 
info string b2b4  (1413) N:      50 (+ 0) (P:  7.21%) (Q: -0.98839) (U: 0.12280) (Q+U: -0.86560) (V: -0.9863) 
info string b2b3  (1417) N:      50 (+ 0) (P:  7.39%) (Q: -0.99126) (U: 0.12597) (Q+U: -0.86529) (V: -0.9841) 
info string b2e2  (1423) N:      58 (+ 0) (P:  5.49%) (Q: -0.94103) (U: 0.08086) (Q+U: -0.86017) (V: -0.9855) 
info string b2b1  (1428) N:      62 (+ 0) (P:  9.01%) (Q: -0.98839) (U: 0.12438) (Q+U: -0.86402) (V: -0.9823) 
info string b2g2  (1425) N:      71 (+ 1) (P:  5.84%) (Q: -0.93463) (U: 0.06951) (Q+U: -0.86512) (V: -0.9903) 
info string b2b8  (1404) N:      89 (+ 0) (P: 12.54%) (Q: -0.98511) (U: 0.12113) (Q+U: -0.86397) (V: -0.9713) 
info string b2f2  (1424) N:     107 (+ 0) (P:  6.45%) (Q: -0.91639) (U: 0.05192) (Q+U: -0.86446) (V: -0.9851) 

Although then again, maybe that does mean temperature should change more significantly given the low learning rate… ?

Alternatively ?? if resign play through was 30%, the rough calculations above with T=0.45 would then be 70% black resign + 30% * 60% black blunder = ~88% average white win…

Videodr0me commented 5 years ago

So it doesn't seem that this is something that the net can't learn. Just needs to have relevant positions played correctly in the training data.

Yes, but from the above results it seems that with CP it would learn this an order of magnitude faster. As even if 800 visits the score for Rf2 is much lower than with normal leela. This should compound nicely though training.

gonzalezjo commented 5 years ago

Zero temperature seems to produce games that are much more realistic as learning data. This post raises the question of whether or not Leela is learning to seek positions designed to exploit opponents that play with temperature, which is pretty scary. Test 50 should provide interesting data regarding this.

lp200 commented 5 years ago

The train parameter of T30 has changed to temp-endgame = 0 and resign-percentage = 0 I think that weight will be worse by learning more than 100 troll moves per game.

Mardak commented 4 years ago

With #1197 and 591226 it takes 2800 visits to Rf2 to figure out it can force draw with stalemate or repetition:

position startpos moves e2e4 c7c6 d2d4 d7d5 e4e5 c8f5 f1e2 e7e6 g1f3 f8b4 c2c3 b4e7 c1e3 b8d7 e1g1 f5g6 b1d2 g8h6 e3h6 g7h6 d2b3 f7f6 e5f6 e7d6 d1d2 d8f6 d2h6 e8c8 a1e1 h8g8 e2d1 d8f8 g2g3 g6f5 h6f6 d7f6 f3h4 f5h3 h4g2 h7h5 e1e3 h5h4 g2h4 h3f1 g1f1 c8d7 f1g2 f6g4 d1g4 g8g4 h4f3 b7b6 h2h3 g4g7 b3c1 d6f4 e3e1 f4c1 e1c1 c6c5 g3g4 d7d6 c1c2 f8f4 g2g3 f4e4 c2d2 d6e7 f3e5 e4e1 h3h4 a7a5 h4h5 e1g1 g3f4 g7g8 f2f3 e7f6 d2h2 f6g7 e5g6 g7f6 g6e5 f6g7 e5g6 g1f1 h2e2 g8e8 g6e5 e8f8 f4g5 f1g1 e5g6 f8f5 g5h4 g1h1 h4g3 f5f6 g6h4 c5d4 c3d4 h1g1 h4g2 f6h6 g3h2 g1d1 g2f4 h6f6 f4e6 g7f7 e6g5 f7g8 h2g3 d1g1 g3h2 g1a1 a2a3 a1d1 h2g3 d1g1 g3h3 b6b5 e2e8 f6f8 e8e6 g1d1 h5h6 b5b4 a3b4 a5b4 h3h4 d1h1 h4g3 b4b3 e6b6 h1g1 g3h2 g1a1 b6b3 a1a6 h6h7 g8h8 h2g3 a6a8 f3f4 a8b8 b3c3 b8b2 f4f5 b2b7 g3f4 b7e7 c3c6 e7e2 c6d6 e2f2 f4g3 f2f1 d6d5 f1g1 g3f4 g1f1 f4e3 f8e8 d5e5 e8f8 e5e6 f1d1 e6d6 f8e8 e3f3 d1f1 f3g2 f1f4 g2g3 f4f1 d6e6 e8b8 e6g6 b8b3 g3g2 b3b2 g2f1
go nodes 3000

info nodes 2810 score cp     0 multipv 1 pv b2f2 f1f2
info nodes   68 score cp  -755 multipv 2 pv b2b8 f5f6 b8a8 g6g8 a8g8
info nodes   68 score cp -1001 multipv 3 pv b2b1 f1g2 b1b2 g2h3 b2c2
info nodes   60 score mate  -1 multipv 4 pv b2b6 g6g8
info nodes   60 score mate  -1 multipv 5 pv b2b4 g6g8
Naphthalin commented 4 years ago

@Mardak can you check whether 701750 or latest T70 get it right? as they are trained with 0.3 resp. 0.0 endgame temp.

Mardak commented 4 years ago
./lc0 -w 701780 -v

# black looking for forcing stalemate move Rf2
position startpos moves e2e4 c7c6 d2d4 d7d5 e4e5 c8f5 f1e2 e7e6 g1f3 f8b4 c2c3 b4e7 c1e3 b8d7 e1g1 f5g6 b1d2 g8h6 e3h6 g7h6 d2b3 f7f6 e5f6 e7d6 d1d2 d8f6 d2h6 e8c8 a1e1 h8g8 e2d1 d8f8 g2g3 g6f5 h6f6 d7f6 f3h4 f5h3 h4g2 h7h5 e1e3 h5h4 g2h4 h3f1 g1f1 c8d7 f1g2 f6g4 d1g4 g8g4 h4f3 b7b6 h2h3 g4g7 b3c1 d6f4 e3e1 f4c1 e1c1 c6c5 g3g4 d7d6 c1c2 f8f4 g2g3 f4e4 c2d2 d6e7 f3e5 e4e1 h3h4 a7a5 h4h5 e1g1 g3f4 g7g8 f2f3 e7f6 d2h2 f6g7 e5g6 g7f6 g6e5 f6g7 e5g6 g1f1 h2e2 g8e8 g6e5 e8f8 f4g5 f1g1 e5g6 f8f5 g5h4 g1h1 h4g3 f5f6 g6h4 c5d4 c3d4 h1g1 h4g2 f6h6 g3h2 g1d1 g2f4 h6f6 f4e6 g7f7 e6g5 f7g8 h2g3 d1g1 g3h2 g1a1 a2a3 a1d1 h2g3 d1g1 g3h3 b6b5 e2e8 f6f8 e8e6 g1d1 h5h6 b5b4 a3b4 a5b4 h3h4 d1h1 h4g3 b4b3 e6b6 h1g1 g3h2 g1a1 b6b3 a1a6 h6h7 g8h8 h2g3 a6a8 f3f4 a8b8 b3c3 b8b2 f4f5 b2b7 g3f4 b7e7 c3c6 e7e2 c6d6 e2f2 f4g3 f2f1 d6d5 f1g1 g3f4 g1f1 f4e3 f8e8 d5e5 e8f8 e5e6 f1d1 e6d6 f8e8 e3f3 d1f1 f3g2 f1f4 g2g3 f4f1 d6e6 e8b8 e6g6 b8b3 g3g2 b3b2 g2f1
go nodes 3000

info b2e2 N:      97 (P:  7.10%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2b6 N:      97 (P:  7.10%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2h2 N:      97 (P:  7.11%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2g2 N:      97 (P:  7.12%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2b5 N:      97 (P:  7.14%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2d2 N:      98 (P:  7.15%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2b4 N:      98 (P:  7.15%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2b7 N:      98 (P:  7.16%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2c2 N:      98 (P:  7.17%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2b3 N:      98 (P:  7.17%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2a2 N:      98 (P:  7.18%) (WL: -1.00000) (D:  0.000) (V: -1.0000) (T) 
info b2b1 N:     109 (P:  7.21%) (WL: -0.99213) (D:  0.006) (V: -0.9910)  
info b2b8 N:     110 (P:  7.13%) (WL: -0.99069) (D:  0.007) (V: -0.9932)  
info b2f2 N:     910 (P:  7.10%) (WL: -0.92603) (D:  0.073) (V: -0.9939) (L) 

# following white considering (against) stalemate Kxf2
info f1f2 N:      34 (P: 32.69%) (WL:  0.00000) (D:  1.000) (V:  0.0000) (T) 
info f1g1 N:     966 (P: 33.28%) (WL:  0.94117) (D:  0.058) (V:  0.9908) (W) 
info f1e1 N:    1465 (P: 34.03%) (WL:  0.95743) (D:  0.042) (V:  0.9900) (W) 

Looks like the value head with ±0.99 of these positions seems like they would get adjudicated before finding stalemate.

Mardak commented 4 years ago

Here's various networks and what they think of black's Rf2 to force stalemate:

 11258 b2f2 (P: 10.74%) (V: -0.9406) 
 22202 b2f2 (P: 11.29%) (V: -0.7820) 
 32930 b2f2 (P:  6.47%) (V: -0.9872) 
 42850 b2f2 (P:  8.96%) (V: -0.9817) 
591226 b2f2 (P:  7.07%) (V: -0.9751) 
700010 b2f2 (P:  5.74%) (V: -0.9232) 
700020 b2f2 (P:  4.85%) (V: -0.9660) 
700030 b2f2 (P:  4.91%) (V: -0.9678) 
700040 b2f2 (P:  5.30%) (V: -0.9757) 
700050 b2f2 (P:  6.08%) (V: -0.9797) 
700060 b2f2 (P:  6.65%) (V: -0.9391) 
700070 b2f2 (P:  7.11%) (V: -0.9588) 
700080 b2f2 (P:  7.12%) (V: -0.9606) 
700090 b2f2 (P:  7.10%) (V: -0.9516) 
700100 b2f2 (P:  7.05%) (V: -0.9612) 
700110 b2f2 (P:  7.10%) (V: -0.9651) 
700120 b2f2 (P:  7.09%) (V: -0.9643) 
700130 b2f2 (P:  7.11%) (V: -0.9661) 
700140 b2f2 (P:  7.06%) (V: -0.9656) 
700150 b2f2 (P:  7.05%) (V: -0.9681) 
700160 b2f2 (P:  7.04%) (V: -0.9695) 
700170 b2f2 (P:  7.06%) (V: -0.9678) 
700180 b2f2 (P:  7.08%) (V: -0.9676) 
700190 b2f2 (P:  7.09%) (V: -0.9707) 
700200 b2f2 (P:  7.04%) (V: -0.9599) 
700210 b2f2 (P:  7.08%) (V: -0.9592) 
700220 b2f2 (P:  7.13%) (V: -0.9675) 
700230 b2f2 (P:  7.06%) (V: -0.9700) 
700240 b2f2 (P:  7.08%) (V: -0.9728) 
700250 b2f2 (P:  7.03%) (V: -0.9711) 
700260 b2f2 (P:  7.06%) (V: -0.9712) 
700270 b2f2 (P:  6.96%) (V: -0.9811) 
700280 b2f2 (P:  7.02%) (V: -0.9823) 
700290 b2f2 (P:  6.90%) (V: -0.9714) 
700300 b2f2 (P:  6.94%) (V: -0.9760) 
700310 b2f2 (P:  7.06%) (V: -0.9720) 
700320 b2f2 (P:  6.89%) (V: -0.9717) 
700330 b2f2 (P:  6.93%) (V: -0.9777) 
700340 b2f2 (P:  7.05%) (V: -0.9686) 
700350 b2f2 (P:  7.08%) (V: -0.9644) 
700360 b2f2 (P:  7.04%) (V: -0.9760) 
700370 b2f2 (P:  7.08%) (V: -0.9739) 
700380 b2f2 (P:  7.01%) (V: -0.9664) 
700390 b2f2 (P:  7.10%) (V: -0.9738) 
700400 b2f2 (P:  7.10%) (V: -0.9755) 
700410 b2f2 (P:  7.10%) (V: -0.9714) 
700420 b2f2 (P:  7.09%) (V: -0.9722) 
700430 b2f2 (P:  7.10%) (V: -0.9705) 
700440 b2f2 (P:  7.13%) (V: -0.9714) 
700450 b2f2 (P:  7.13%) (V: -0.9727) 
700460 b2f2 (P:  7.11%) (V: -0.9728) 
700470 b2f2 (P:  7.10%) (V: -0.9760) 
700480 b2f2 (P:  7.11%) (V: -0.9785) 
700490 b2f2 (P:  7.11%) (V: -0.9750) 
700500 b2f2 (P:  7.12%) (V: -0.9798) 
700510 b2f2 (P:  7.09%) (V: -0.9824) 
700520 b2f2 (P:  7.10%) (V: -0.9813) 
700530 b2f2 (P:  7.10%) (V: -0.9820) 
700540 b2f2 (P:  7.10%) (V: -0.9800) 
700550 b2f2 (P:  7.09%) (V: -0.9751) 
700560 b2f2 (P:  7.10%) (V: -0.9790) 
700570 b2f2 (P:  7.07%) (V: -0.9719) 
700580 b2f2 (P:  7.08%) (V: -0.9729) 
700590 b2f2 (P:  7.09%) (V: -0.9796) 
700600 b2f2 (P:  7.09%) (V: -0.9767) 
700610 b2f2 (P:  7.08%) (V: -0.9755) 
700620 b2f2 (P:  7.08%) (V: -0.9772) 
700630 b2f2 (P:  7.07%) (V: -0.9807) 
700640 b2f2 (P:  7.07%) (V: -0.9803) 
700650 b2f2 (P:  7.07%) (V: -0.9736) 
700660 b2f2 (P:  7.07%) (V: -0.9809) 
700670 b2f2 (P:  7.08%) (V: -0.9826) 
700680 b2f2 (P:  7.09%) (V: -0.9863) 
700690 b2f2 (P:  7.08%) (V: -0.9845) 
700700 b2f2 (P:  7.08%) (V: -0.9814) 
700710 b2f2 (P:  7.06%) (V: -0.9776) 
700720 b2f2 (P:  7.07%) (V: -0.9809) 
700730 b2f2 (P:  7.07%) (V: -0.9823) 
700740 b2f2 (P:  7.06%) (V: -0.9822) 
700750 b2f2 (P:  7.06%) (V: -0.9798) 
700760 b2f2 (P:  7.06%) (V: -0.9786) 
700770 b2f2 (P:  7.07%) (V: -0.9812) 
700780 b2f2 (P:  7.08%) (V: -0.9829) 
700790 b2f2 (P:  7.07%) (V: -0.9812) 
700800 b2f2 (P:  7.07%) (V: -0.9810) 
700810 b2f2 (P:  7.06%) (V: -0.9806) 
700820 b2f2 (P:  7.07%) (V: -0.9833) 
700830 b2f2 (P:  7.07%) (V: -0.9836) 
700840 b2f2 (P:  7.08%) (V: -0.9855) 
700850 b2f2 (P:  7.07%) (V: -0.9841) 
700860 b2f2 (P:  7.07%) (V: -0.9837) 
700870 b2f2 (P:  7.07%) (V: -0.9844) 
700880 b2f2 (P:  7.07%) (V: -0.9833) 
700890 b2f2 (P:  7.07%) (V: -0.9839) 
700900 b2f2 (P:  7.07%) (V: -0.9836) 
700910 b2f2 (P:  7.07%) (V: -0.9855) 
700920 b2f2 (P:  7.07%) (V: -0.9852) 
700930 b2f2 (P:  7.07%) (V: -0.9851) 
700940 b2f2 (P:  7.07%) (V: -0.9849) 
700950 b2f2 (P:  7.07%) (V: -0.9851) 
700960 b2f2 (P:  7.07%) (V: -0.9849) 
700970 b2f2 (P:  7.07%) (V: -0.9848) 
700980 b2f2 (P:  7.07%) (V: -0.9844) 
700990 b2f2 (P:  7.06%) (V: -0.9840) 
701000 b2f2 (P:  7.07%) (V: -0.9856) 
701010 b2f2 (P:  7.07%) (V: -0.9864) 
701020 b2f2 (P:  7.06%) (V: -0.9851) 
701030 b2f2 (P:  7.07%) (V: -0.9856) 
701040 b2f2 (P:  7.07%) (V: -0.9859) 
701050 b2f2 (P:  7.07%) (V: -0.9886) 
701060 b2f2 (P:  7.07%) (V: -0.9929) 
701070 b2f2 (P:  7.08%) (V: -0.9905) 
701080 b2f2 (P:  7.08%) (V: -0.9916) 
701090 b2f2 (P:  7.08%) (V: -0.9901) 
701100 b2f2 (P:  7.08%) (V: -0.9897) 
701110 b2f2 (P:  7.08%) (V: -0.9917) 
701120 b2f2 (P:  7.08%) (V: -0.9903) 
701130 b2f2 (P:  7.09%) (V: -0.9917) 
701140 b2f2 (P:  7.09%) (V: -0.9946) 
701150 b2f2 (P:  7.09%) (V: -0.9941) 
701160 b2f2 (P:  7.09%) (V: -0.9942) 
701170 b2f2 (P:  7.08%) (V: -0.9934) 
701180 b2f2 (P:  7.08%) (V: -0.9937) 
701190 b2f2 (P:  7.08%) (V: -0.9926) 
701200 b2f2 (P:  7.08%) (V: -0.9911) 
701210 b2f2 (P:  7.08%) (V: -0.9923) 
701220 b2f2 (P:  7.08%) (V: -0.9922) 
701230 b2f2 (P:  7.08%) (V: -0.9942) 
701240 b2f2 (P:  7.09%) (V: -0.9942) 
701250 b2f2 (P:  7.09%) (V: -0.9941) 
701260 b2f2 (P:  7.09%) (V: -0.9950) 
701270 b2f2 (P:  7.09%) (V: -0.9947) 
701280 b2f2 (P:  7.08%) (V: -0.9938) 
701290 b2f2 (P:  7.08%) (V: -0.9943) 
701300 b2f2 (P:  7.09%) (V: -0.9941) 
701310 b2f2 (P:  7.08%) (V: -0.9937) 
701320 b2f2 (P:  7.08%) (V: -0.9936) 
701330 b2f2 (P:  7.09%) (V: -0.9927) 
701340 b2f2 (P:  7.08%) (V: -0.9933) 
701350 b2f2 (P:  7.09%) (V: -0.9934) 
701360 b2f2 (P:  7.08%) (V: -0.9943) 
701370 b2f2 (P:  7.08%) (V: -0.9947) 
701380 b2f2 (P:  7.09%) (V: -0.9941) 
701390 b2f2 (P:  7.09%) (V: -0.9934) 
701400 b2f2 (P:  7.09%) (V: -0.9933) 
701410 b2f2 (P:  7.09%) (V: -0.9945) 
701420 b2f2 (P:  7.09%) (V: -0.9929) 
701430 b2f2 (P:  7.09%) (V: -0.9927) 
701440 b2f2 (P:  7.09%) (V: -0.9927) 
701450 b2f2 (P:  7.09%) (V: -0.9937) 
701460 b2f2 (P:  7.08%) (V: -0.9943) 
701470 b2f2 (P:  7.09%) (V: -0.9933) 
701480 b2f2 (P:  7.09%) (V: -0.9931) 
701490 b2f2 (P:  7.09%) (V: -0.9931) 
701500 b2f2 (P:  7.09%) (V: -0.9928) 
701510 b2f2 (P:  7.09%) (V: -0.9927) 
701520 b2f2 (P:  7.09%) (V: -0.9929) 
701530 b2f2 (P:  7.09%) (V: -0.9929) 
701540 b2f2 (P:  7.09%) (V: -0.9914) 
701550 b2f2 (P:  7.08%) (V: -0.9906) 
701560 b2f2 (P:  7.09%) (V: -0.9896) 
701570 b2f2 (P:  7.08%) (V: -0.9887) 
701580 b2f2 (P:  7.09%) (V: -0.9875) 
701590 b2f2 (P:  7.08%) (V: -0.9873) 
701600 b2f2 (P:  7.09%) (V: -0.9887) 
701610 b2f2 (P:  7.08%) (V: -0.9874) 
701620 b2f2 (P:  7.08%) (V: -0.9906) 
701630 b2f2 (P:  7.09%) (V: -0.9912) 
701640 b2f2 (P:  7.08%) (V: -0.9916) 
701650 b2f2 (P:  7.08%) (V: -0.9913) 
701660 b2f2 (P:  7.08%) (V: -0.9909) 
701670 b2f2 (P:  7.08%) (V: -0.9914) 
701680 b2f2 (P:  7.08%) (V: -0.9911) 
701690 b2f2 (P:  7.09%) (V: -0.9922) 
701700 b2f2 (P:  7.08%) (V: -0.9930) 
701710 b2f2 (P:  7.08%) (V: -0.9936) 
701720 b2f2 (P:  7.08%) (V: -0.9927) 
701730 b2f2 (P:  7.08%) (V: -0.9922) 
701740 b2f2 (P:  7.08%) (V: -0.9922) 
701750 b2f2 (P:  7.08%) (V: -0.9924) 
701760 b2f2 (P:  7.07%) (V: -0.9947) 
701770 b2f2 (P:  7.07%) (V: -0.9934) 
701780 b2f2 (P:  7.07%) (V: -0.9939) 
Naphthalin commented 4 years ago

Ah, I didn't look close enough at the position. @kiudee reported that some of the T71 nets with MLH activated in search can actually find the solution there after some nodes, but without such guidance of trying to prolong the loss, MCTS has no chance (especially not at 800 npm).