LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.44k stars 528 forks source link

Improve training data for learning drawn endgames (with lower temperature) #237

Closed Mardak closed 4 years ago

Mardak commented 6 years ago

From TCEC Season 13 - Division 4 Game 46 DeusX 1.0 vs LCZero 16.10161, white can perpetually check from move 51 Rh8+. However, 10161 value and search have win rates that favor black for all these checked positions as white chases the king down and up. screen shot 2018-08-06 at 3 16 11 pm

position startpos moves e2e4 c7c5 g1f3 e7e6 d2d4 c5d4 f3d4 b8c6 b1c3 d8c7 c1e3 a7a6 d1d2 g8f6 e1c1 f8e7 f2f3 b7b5 g2g4 c6d4 e3d4 c8b7 c1b1 e8g8 f1d3 b5b4 c3a4 d7d5 e4e5 f6d7 d2e3 b7c6 a4b6 d7b6 d4b6 c7d7 f3f4 c6b5 b6d4 b5d3 e3d3 a6a5 f4f5 d7c7 f5f6 e7c5 d4c5 c7c5 h2h4 a5a4 g4g5 a4a3 d3d4 c5c7 b2b3 f8c8 h1h2 c7c3 h4h5 c3g3 h2e2 g3f3 d4d2 c8c7 e2f2 f3g3 f2e2 a8b8 d2d4 g3f3 e2h2 f3g3 h2e2 g3h3 g5g6 f7g6 h5g6 h7g6 d1g1 h3f3 e2h2 f3e4 d4e4 d5e4 g1g6 b8b5 c2c4 b5e5 b1c2 e5f5 f6g7 c7g7 g6e6 g7g1 e6e4 f5f3 e4h4 f3c3 c2d2 g1a1 h4h8

Looking at these related positions as if it were self-play with 800 visits but without noise, here's the probability that white would check when the king is on a rank:

7th rank: 786 -> 98%
6th rank: 759 -> 95%
5th rank: 779 -> 97%
4th rank: 768 -> 96%

(Similarly, when the black king is on the 4th rank, it doesn't want to move to the 3rd rank as that would favor white, and with 800 visits, search wants to move back up to 5th rank with 737 visits = 92%.)

With the current move temperature set to 1.0 and averaging white playing the perpetual check move at 96% of visits, this means self-play will correctly draw these positions via 50-move rule if white correctly checks ~25 times = 36% of the time. This means white is more likely to blunder more often than not due to temperature leading to the network learning that this drawn position favors black.

The "average" training data for this position blunders the draw.

If instead the temperature was 0.5 for these moves, the probability that white plays the check move increases to 99.8%. And correctly playing that across 25 moves to draw happens 96% of the time.

If we set a target of half of self-play to correctly play these positions assuming the 4% individual move blunder rate, a temperature of 0.89 would make the correct move be picked 97.3% of the time, i.e., 50% picked 25 times.

To get to .89, self-play could just play with a lower temperature instead of 1.0. Another way is with tempdecay, where in this case, this started with move 51, so a tempdecay moves of 463 and initial 1.0 temperature results in the desired value. (It looks like the maximum game length shown on the stats page is 450, so that might be a convenient number to pick.) Alternatively, if tempdecay moves is set to say 50 so that move 51 has 0 temperature, the server can tell half of self-play requests to play with 0 tempdecay and the other half with 50 moves. And of course there's many other ways to adjust temperature given the existing initial temperature, decay, server response distribution values as well as more complex approaches that require additional cilent and/or server code.

The main drawback of a lower temperature is the benefits of having temperature to begin with, where it seems that the primary purpose of temperature is to play out positions that search otherwise would not favor to then update the value head in future networks with the possibility of search then favoring those positions (as well as learning the search priors for these positions). Lower temperature means these "other" positions are less likely to be played, so there would be fewer game results of that position; however, lower temperature also means the game results that do get fed to training are more accurate (hence this issue), so unclear what's actually the tradeoff.

@dubslow Are there more details for "Test temperature changes in training"

Mardak commented 6 years ago

Not sure if this needs to be explicitly stated, but the problem for having an incorrect value of these drawn positions is that search will play moves getting into these positions that seem to favor one side, say 60% win rate, instead of favoring a move that leads to an actual 55% win rate position.

jjoshua2 commented 6 years ago

This is good analysis. We've talked about trying .5 temp and then 2x playouts each for a day at the end of test 10 to see if they solidify things.

On Mon, Aug 6, 2018, 8:32 PM Ed Lee notifications@github.com wrote:

Not sure if this needs to be explicitly stated, but the problem for having an incorrect value of these drawn positions is that search will play moves getting into these positions that seem to favor one side, say 60% win rate, instead of favoring a move that leads to an actual 55% win rate position.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LeelaChessZero/lc0/issues/237#issuecomment-410894881, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INOt6I20zvaDY3_VF8eaY_R4_qoqCks5uOOA6gaJpZM4VxSup .

dubslow commented 6 years ago

That planning card was an open-ended cover-all for various sorts of things, including dynamic temperature.

I guess this is reasonably persuasive that temperature blunders affect one side disproportionately more than the other. Can you verify and expand upon that?

My request would be that any solution to this problem needs to stay "Zero": such things as starting temperature decay after a fixed move count, or targeting a temperature based on a specific game rule (such as the 50 move rule) aren't really acceptable to me, but something that would be more game-independent would be to pick a (small) probability such that each move in a selfplay game has that chance to begin a temperature decay at a certain rate. For instance, by analogy to the noise's use of average branching factor, we could also decide that average game length is "fair game", and choose 1/average_game_length as the probability to start temperature decay and also use that as the duration over which to decay temperature. (Maybe such would even be capable of supplanting resign?)

dubslow commented 6 years ago

Also perhaps this should have gone in the training repository? This really isn't a topic that is relevant to the engine.

dubslow commented 6 years ago

Also also, keep in mind that perpetuals may be the sort of suitably subtle technique/position/whatever that it requires suitably low LRs to be learned, and this project hasn't really ever fully trained a net thru all the same LRs as DeepMind. For all we know, Leela may yet learn this correctly even at temp=1.

Mardak commented 6 years ago

There might need to be a small lc0 code change to support "average game length":

https://github.com/LeelaChessZero/lc0/blob/1b68b95254e6dacde1af2cfdddf2ae9e50bb5b17/src/mcts/search.cc#L78

Again from http://lczero.org/stats the median game length looks to be around 100 ply with a tail towards 450, so potentially we'll need to support values over 100 moves for temp decay.

remdu commented 6 years ago

In leelazero go moves with only visit will not be picked, how would this influence the probability?

Another thing to consider is that noise levels are based on the average number of legal moves in the game, that implies it might be too high in situations with few legal moves. A dirichlet noise that adapts depending on that might help.

cwbriscoe commented 6 years ago

Could you base the temperature drop on the number of pieces left in the game or would that be considered non-zero? In my mind, it would be no different than basing it on average game length but I am sure some would disagree.

jjoshua2 commented 6 years ago

I think we could just try temperature = 0.5 the whole game, and not have to worry about zero/non-zero or other more complicated schemes.

Mardak commented 6 years ago

A fixed temperature lower than 1 should be good to avoid blundering draws, and at least from the example position from the initial comment, a value of .89 is good enough. A value of 0.5 might be too low for game variety though. Here's an example distribution of visits from startpos:

info string f2f3  (346 ) N:       1 (+ 0) (P:  0.33%) (Q: -0.07453) (U: 0.16061) (Q+U:  0.08607) (V: -0.0745) (T: 0) 
info string g1h3  (161 ) N:       1 (+ 0) (P:  0.35%) (Q: -0.07846) (U: 0.16688) (Q+U:  0.08842) (V: -0.0785) (T: 0) 
info string b1a3  (34  ) N:       1 (+ 0) (P:  0.33%) (Q: -0.06837) (U: 0.15872) (Q+U:  0.09035) (V: -0.0684) (T: 0) 
info string g2g4  (378 ) N:       1 (+ 0) (P:  0.42%) (Q: -0.10868) (U: 0.20205) (Q+U:  0.09337) (V: -0.1087) (T: 0) 
info string h2h4  (403 ) N:       1 (+ 0) (P:  0.41%) (Q: -0.07491) (U: 0.19752) (Q+U:  0.12261) (V: -0.0749) (T: 0) 
info string a2a4  (207 ) N:       2 (+ 0) (P:  0.64%) (Q: -0.06000) (U: 0.20655) (Q+U:  0.14655) (V: -0.0573) (T: 0) 
info string h2h3  (400 ) N:       3 (+ 0) (P:  0.77%) (Q: -0.03172) (U: 0.18440) (Q+U:  0.15267) (V: -0.0421) (T: 0) 
info string b1c3  (36  ) N:       4 (+ 0) (P:  0.77%) (Q: -0.02698) (U: 0.14887) (Q+U:  0.12189) (V: -0.0147) (T: 0) 
info string b2b4  (234 ) N:       4 (+ 0) (P:  1.06%) (Q: -0.06298) (U: 0.20312) (Q+U:  0.14013) (V: -0.0455) (T: 0) 
info string d2d3  (288 ) N:       4 (+ 0) (P:  0.91%) (Q: -0.02022) (U: 0.17529) (Q+U:  0.15507) (V: -0.0333) (T: 0) 
info string a2a3  (204 ) N:       5 (+ 0) (P:  1.04%) (Q: -0.01971) (U: 0.16591) (Q+U:  0.14620) (V: -0.0286) (T: 0) 
info string b2b3  (230 ) N:       6 (+ 0) (P:  1.21%) (Q: -0.02639) (U: 0.16628) (Q+U:  0.13989) (V: -0.0356) (T: 0) 
info string f2f4  (351 ) N:      17 (+ 0) (P:  4.39%) (Q: -0.06948) (U: 0.23419) (Q+U:  0.16471) (V: -0.0545) (T: 0) 
info string c2c3  (259 ) N:      26 (+ 0) (P:  4.09%) (Q:  0.01603) (U: 0.14568) (Q+U:  0.16171) (V:  0.0190) (T: 0) 
info string e2e3  (317 ) N:      34 (+ 0) (P:  5.02%) (Q:  0.02613) (U: 0.13795) (Q+U:  0.16408) (V:  0.0137) (T: 0) 
info string e2e4  (322 ) N:      45 (+ 0) (P:  5.10%) (Q:  0.05521) (U: 0.10660) (Q+U:  0.16181) (V:  0.0545) (T: 0) 
info string g1f3  (159 ) N:      75 (+ 0) (P:  8.92%) (Q:  0.05181) (U: 0.11274) (Q+U:  0.16455) (V:  0.0371) (T: 0) 
info string d2d4  (293 ) N:      80 (+ 0) (P:  8.92%) (Q:  0.05837) (U: 0.10584) (Q+U:  0.16421) (V:  0.0553) (T: 0) 
info string g2g3  (374 ) N:      92 (+ 0) (P: 11.92%) (Q:  0.04090) (U: 0.12322) (Q+U:  0.16412) (V:  0.0409) (T: 0) 
info string c2c4  (264 ) N:     397 (+ 0) (P: 43.39%) (Q:  0.05956) (U: 0.10478) (Q+U:  0.16434) (V:  0.0539) (T: 0) 

And here's the probabilities of picking those moves with various temperatures:

T=1.0 49.7 11.5 10.0 9.4 5.6 4.3 3.3 2.1 0.8 0.6 0.5 0.5 0.5 0.4 0.3 0.1 0.1 0.1 0.1 0.1
T=0.9 55.4 10.9  9.3 8.7 4.9 3.6 2.7 1.7 0.5 0.4 0.3 0.3 0.3 0.2 0.2 0.1 0.1 0.1 0.1 0.1
T=0.8 62.1 10.0  8.4 7.7 4.1 2.9 2.1 1.2 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.0 0.0 0.0 0.0 0.0
T=0.7 69.7  8.6  7.1 6.4 3.1 2.1 1.4 0.8 0.2 0.1 0.1 0.1 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0
T=0.6 78.0  6.8  5.4 4.9 2.1 1.3 0.8 0.4 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
T=0.5 86.4  4.6  3.5 3.1 1.1 0.6 0.4 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

So with T=1 and this particular network, 50 games out of 100 would play something other than c2c4. While with T=0.5, only 15 of 100 would play something different.

Conveniently with T=0.8, the single-visit moves in my above output round to 0.0% probability of being picked, so there would be less of a need for special logic to filter out "minimum number of visits for temperature move selection."

An experiment with T=0.5 is reasonable, but it seems likely to overfit due to the lack of variety; although a formal experiment result would be good. For a single fixed temperature, I would suggest something between 0.8 and 0.9.

Mardak commented 6 years ago

Getting more complicated in allowing more variety early game would perhaps start with T>1 then decay it to the end-game target T>0. But perhaps that won't be necessary if the initial temperature is not too low.

Separately, regarding the lessened issue of single-visit/low-visit moves with a T<1, one of the main concerns in #8 to improve tactics by visiting potentially undesired moves was that these bad moves could be played with T=1. However, that concern should be reduced with say T=0.8 as the temperature favors the good moves. For example, if 2 visits were forced for every root move and assuming 50 bad moves and 1 actually good move (so search puts 100 visits into 50 wrong moves and 700 visits into the right move):

T=1.0 87.5 0.3 0.3 0.3…
T=0.9 93.1 0.1 0.1 0.1…
T=0.8 96.8 0.1 0.1 0.1…
T=0.7 98.9 0.1 0.1 0.1…

Similarly if there were 2 possible equally good moves and 50 bad moves:


T=1.0 43.8 43.8 0.3 0.3…
T=0.9 46.3 46.3 0.1 0.1…
T=0.8 48.1 48.1 0.1 0.1…
T=0.7 49.2 49.2 0.0 0.0…
remdu commented 6 years ago

I'm in favor of this T=0.8 idea. Finding a somewhat reasonable compromise between blunders and exploration should be good.

The only problem is that the correct point probably depends on other parameters such as puct, softmax, number of visits per move... and thus this kind of analysis would be needed every time these parameters are changed.

jjoshua2 commented 6 years ago

I think a lower value like that would work with any puct at least better. Visits would be more variable since with more visits there is more chance of a bad node getting a visit or two even due to noise

On Wed, Aug 15, 2018, 10:24 AM Eddh notifications@github.com wrote:

I'm in favor of this T=0.8 idea. Finding a somewhat reasonable compromise between blunders and exploration should be good.

The only problem is that the correct point probably depends on other parameters such as puct, softmax, number of visits per move... and thus this kind of analysis would be needed every time these parameters are changed.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LeelaChessZero/lc0/issues/237#issuecomment-413213746, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INFaNxHWWvnLvLaAAP5Iv9Hjt52PRks5uRC8VgaJpZM4VxSup .

gonzalezjo commented 6 years ago

Why not play half games with low temperature and half with high temperature? Training can still visit new ideas, but will see older ideas in a truer light.

dubslow commented 6 years ago

That's not too far from stochastic temperature decay, which I posited on Discord

Ishinoshita commented 6 years ago

@dubslow : A more distant question, although related to the effect of temperature injecting to much noise in value head training data: Are you also seriously considering trying to train value head against average of Q and z ? To start with, is Q value currently captured in the training data ?

gonzalezjo commented 6 years ago

@dubslow I was curious to what you were referring to, so I searched around on Discord and found your messages. To encourage discussion, and for anyone else who is curious, here it is:

[10:00 PM] Dubslow: new idea for training temperature: we should use stochastic temperature. Every ply has like a 1 in 200 or so chance of starting some sort of scheduled temperature decay
[10:00 PM] Dubslow: so most games would have high temperature in mid game
[10:00 PM] Dubslow: some games would have high temperature in endgame, some wouldn't
[10:00 PM] Dubslow: and a few games would be nearly full strength
[10:00 PM] Dubslow: (policy noise stays always on as current)

My thoughts: I like it. It theoretically gets us the best of all discussed options. I would like to see this tested.

Mardak commented 6 years ago

In looking at #290, there were some positions that 10520 really thought was favorable to one side and wants to avoid a TB draw move.

2-14.1 lc0 16.10520 vs Pedone 1.8 (KRPvKNP)
1. e4 c5 2. c3 Nf6 3. e5 Nd5 4. Bc4 Nb6 5. Bb3 d6 6. exd6 c4 7. Bc2 Qxd6 8. Nf3 Qe6+ 9. Kf1 g6 10. h4 Nc6 11. h5 Bg7 12. d4 cxd3 13. Qxd3 Ne5 14. Nxe5 Qxe5 15. Qe2 Bf5 16. Bxf5 gxf5 17. a4 Rg8 18. Qb5+ Qxb5+ 19. axb5 Be5 20. g3 O-O-O 21. Ke2 Bb8 22. Na3 e6 23. Rh4 Rg4 24. Rxg4 fxg4 25. c4 Nd7 26. Be3 Ne5 27. c5 Rd3 28. Rb1 Rb3 29. Kd2 Nd7 30. Kc2 Rxe3 31. fxe3 Nxc5 32. Rc1 Kd8 33. Nc4 Ne4 34. Na5 Nd6 35. h6 f5 36. Kd3 Ke7 37. Ke2 Kf6 38. Kf1 Nxb5 39. Nxb7 Bxg3 40. Rc6 Nc7 41. Nc5 Ke7 42. e4 Bd6 43. Nd3 Nb5 44. exf5 exf5 45. Ra6 Kd7 46. b4 f4 47. Nc5+ Ke7 48. Ne4 Bxb4 49. Rg6 Bc3 50. Nxc3 Nxc3 51. Rg7+ Kf8 52. Rxg4 Nb5 53. Rxf4+ Ke7 54. Rc4 a5 55. Kg2 a4 56. Rxa4 1/2-1/2
position startpos moves e2e4 c7c5 c2c3 g8f6 e4e5 f6d5 f1c4 d5b6 c4b3 d7d6 e5d6 c5c4 b3c2 d8d6 g1f3 d6e6 e1f1 g7g6 h2h4 b8c6 h4h5 f8g7 d2d4 c4d3 d1d3 c6e5 f3e5 e6e5 d3e2 c8f5 c2f5 g6f5 a2a4 h8g8 e2b5 e5b5 a4b5 g7e5 g2g3 e8c8 f1e2 e5b8 b1a3 e7e6 h1h4 g8g4 h4g4 f5g4 c3c4 b6d7 c1e3 d7e5 c4c5 d8d3 a1b1 d3b3 e2d2 e5d7 d2c2 b3e3 f2e3 d7c5 b1c1 c8d8 a3c4 c5e4 c4a5 e4d6 h5h6 f7f5 c2d3 d8e7 d3e2 e7f6 e2f1 d6b5 a5b7 b8g3 c1c6 b5c7 b7c5 f6e7 e3e4 g3d6 c5d3 c7b5 e4f5 e6f5 c6a6 e7d7 b2b4 f5f4 d3c5 d7e7 c5e4 d6b4 a6g6 b4c3 e4c3 b5c3 g6g7 e7f8 g7g4 c3b5 g4f4 f8e7 f4c4 a7a5 f1g2 a5a4

noTB c4a4  (718 ) N:    4519 (+ 1) (P: 24.90%) (Q:  0.72924) (U: 0.01428) (Q+U:  0.74352) (V:  0.6509) (played)
with c4a4  (718 ) N:     130 (+ 0) (P: 24.90%) (Q:  0.00000) (U: 0.58093) (Q+U:  0.58093) (V:  0.0000) (T) (played)

According to lichess tb, white has 19 moves that are draw and 3 moves that lose DTZ 1. After c4a4, black has 6 moves that draw and 8 moves that lose DTZ1-15. After a black draw move, white has 20 moves that are draw and 2 moves that lose DTZ 1. So in these positions, if we assume each move is just equally likely to be played, black has ~60% chance to blunder and ~10% for white. Seems to somewhat line up with 10520 having a much higher win rate for white than the TB draw.

2-21.1 Xiphos 0.3.14 vs Nirvana 2.4 (KRPvKBP)
1. e4 d6 2. d4 Nf6 3. Bd3 e5 4. Nf3 exd4 5. Nxd4 g6 6. Bf4 Bg7 7. Nc3 O-O 8. O-O a6 9. Qd2 c5 10. Nf3 Nc6 11. Bg5 Bg4 12. Qf4 Bxf3 13. Qxf3 Nd4 14. Qd1 Ne6 15. Bh4 b5 16. f4 Qe8 17. Kh1 Nc7 18. Qf3 c4 19. Be2 b4 20. Bxf6 Bxf6 21. Nd1 Qb5 22. a3 bxa3 23. Qxa3 Rfe8 24. Qf3 Ne6 25. c3 Bg7 26. b4 Qc6 27. Nb2 Nc7 28. f5 Qxe4 29. fxg6 Qxf3 30. Bxf3 hxg6 31. Bxa8 Rxa8 32. Nxc4 Bxc3 33. Ra4 d5 34. Nd6 Rf8 35. Rc1 d4 36. Ne4 Rd8 37. g3 Nd5 38. Rxa6 Bxb4 39. Rc4 Bc3 40. Rd6 Rxd6 41. Nxd6 Bb2 42. Rc2 Bc3 43. Nb5 Nb4 44. Rf2 Kf8 45. Rf3 Ba1 46. Rf1 Nc2 47. Rd1 Ke7 48. Nxd4 Bxd4 49. Rd2 Be5 50. Rxc2 f5 51. Rf2 Ke6 52. Kg2 Bd4 53. Rf1 Be5 54. h3 Bc3 55. Rf2 Bb4 56. g4 fxg4 57. hxg4 1/2-1/2
position startpos moves e2e4 d7d6 d2d4 g8f6 f1d3 e7e5 g1f3 e5d4 f3d4 g7g6 c1f4 f8g7 b1c3 e8g8 e1g1 a7a6 d1d2 c7c5 d4f3 b8c6 f4g5 c8g4 d2f4 g4f3 f4f3 c6d4 f3d1 d4e6 g5h4 b7b5 f2f4 d8e8 g1h1 e6c7 d1f3 c5c4 d3e2 b5b4 h4f6 g7f6 c3d1 e8b5 a2a3 b4a3 f3a3 f8e8 a3f3 c7e6 c2c3 f6g7 b2b4 b5c6 d1b2 e6c7 f4f5 c6e4 f5g6 e4f3 e2f3 h7g6 f3a8 e8a8 b2c4 g7c3 a1a4 d6d5 c4d6 a8f8 f1c1 d5d4 d6e4 f8d8 g2g3 c7d5 a4a6 c3b4 c1c4 b4c3 a6d6 d8d6 e4d6 c3b2 c4c2 b2c3 d6b5 d5b4 c2f2 g8f8 f2f3 c3a1 f3f1 b4c2 f1d1 f8e7 b5d4 a1d4 d1d2 d4e5 d2c2 f7f5 c2f2 e7e6 h1g2 e5d4 f2f1 d4e5 h2h3 e5c3 f1f2 c3b4 g3g4 f5g4

noTB h3g4  (641 ) N:    4905 (+ 1) (P: 41.39%) (Q:  0.74764) (U: 0.02085) (Q+U:  0.76850) (V:  0.7442) (played)
with h3g4  (641 ) N:     287 (+ 0) (P: 41.39%) (Q:  0.00000) (U: 0.36710) (Q+U:  0.36710) (V:  0.0000) (T) (played)

White has 13 moves to draw and 6 moves to lose DTZ1. After h3g4, black has 7 moves to draw and 8 moves to lose DTZ1-9. Then white has 16 moves to draw and 4 moves to lose DTZ1. So again, as long as white doesn't play a move that immediately sacrifices a pieces, it's pretty safe, and search probably keeps visits away from these immediate sacrifice moves anyway, but black needs to try quite a bit harder to avoid blundering the draw.

2-13.3 ChessBrainVB 3.70 vs Ethereal 10.85 (KBPPvKB)
1. e4 c5 2. Nf3 e6 3. c3 Nc6 4. d4 cxd4 5. cxd4 d5 6. e5 Nge7 7. Nc3 Nf5 8. Bb5 Bd7 9. O-O Rc8 10. Ba4 Qb6 11. Ne2 Be7 12. Bc2 Nh4 13. Nxh4 Bxh4 14. a3 Be7 15. Re1 Na5 16. Rb1 O-O 17. b4 Nc4 18. Rb3 f5 19. Rh3 g6 20. Bb3 a5 21. bxa5 Qxa5 22. Bh6 Rf7 23. Nf4 Bxa3 24. Re2 Bf8 25. Bg5 Qb4 26. Ra2 Bg7 27. Rg3 h6 28. Bf6 Nxe5 29. Nxd5 Nf3+ 30. Rxf3 exd5 31. Bxg7 Kxg7 32. Re3 Re8 33. Bxd5 Rxe3 34. fxe3 Re7 35. Re2 g5 36. Qb3 Qxb3 37. Bxb3 Bb5 38. Re1 Bd3 39. Kf2 Rc7 40. Rd1 Be4 41. Rd2 b5 42. Bd1 Rb7 43. Rb2 b4 44. g3 Rb6 45. Bc2 Kf6 46. Ke2 h5 47. Kd2 Bf3 48. Rb1 g4 49. Ra1 Bd5 50. Bd3 Be4 51. Ra6 Rxa6 52. Bxa6 Kg5 53. Bc4 Bc6 54. Kc2 Ba4+ 55. Bb3 Bd7 56. Kd3 Bb5+ 57. Bc4 Ba4 58. Be6 Bb5+ 59. Kc2 Ba4+ 60. Kb2 h4 61. gxh4+ Kxh4 62. Bxf5 Kh3 63. d5 Kxh2 64. Bxg4 Kg3 65. Bf5 Kf3 66. Bc2 Bb5 67. e4 Ke3 68. Kb3 Kd4 69. Kxb4 1/2-1/2
position startpos moves e2e4 c7c5 g1f3 e7e6 c2c3 b8c6 d2d4 c5d4 c3d4 d7d5 e4e5 g8e7 b1c3 e7f5 f1b5 c8d7 e1g1 a8c8 b5a4 d8b6 c3e2 f8e7 a4c2 f5h4 f3h4 e7h4 a2a3 h4e7 f1e1 c6a5 a1b1 e8g8 b2b4 a5c4 b1b3 f7f5 b3h3 g7g6 c2b3 a7a5 b4a5 b6a5 c1h6 f8f7 e2f4 e7a3 e1e2 a3f8 h6g5 a5b4 e2a2 f8g7 h3g3 h7h6 g5f6 c4e5 f4d5 e5f3 g3f3 e6d5 f6g7 g8g7 f3e3 c8e8 b3d5 e8e3 f2e3 f7e7 a2e2 g6g5 d1b3 b4b3 d5b3 d7b5 e2e1 b5d3 g1f2 e7c7 e1d1 d3e4 d1d2 b7b5 b3d1 c7b7 d2b2 b5b4 g2g3 b7b6 d1c2 g7f6 f2e2 h6h5 e2d2 e4f3 b2b1 g5g4 b1a1 f3d5 c2d3 d5e4 a1a6 b6a6 d3a6 f6g5 a6c4 e4c6 d2c2 c6a4 c4b3 a4d7 c2d3 d7b5 b3c4 b5a4 c4e6 a4b5 d3c2 b5a4 c2b2 h5h4 g3h4 g5h4 e6f5 h4h3 d4d5 h3h2 f5g4 h2g3 g4f5 g3f3 f5c2 a4b5 e3e4 f3e3 b2b3 e3d4

noTB b3b4  (453 ) N:    4932 (+ 1) (P: 59.89%) (Q:  0.61589) (U: 0.02958) (Q+U:  0.64548) (V:  0.5077) (played)
with b3b4  (453 ) N:    1617 (+ 0) (P: 59.89%) (Q:  0.00000) (U: 0.11959) (Q+U:  0.11959) (V:  0.0000) (T) (played)

Here before the 26th capture, lichess says any move white makes (out of 8 possible moves) should lead to a draw. After b3b4, black has 1 correct draw move and 10 losing moves DTZ1. Again white has 10 possible moves all leading to a draw. Then black has 2 draw moves and 7 losing moves DTZ1-7. So these positions seem to be very easy for white to maintain a draw while very difficult (in terms of possible random moves) for black to do the same.


So for these positions, it does seem like lower temperature to correctly reach 50-move draw will teach the network the same thing that TB already knows -- it's a draw.

amjshl commented 6 years ago

I have some interesting data tries to find the correlation between game result from T=1 and game result using T=0. It is related to this issue, but I suggest a slightly different method to improve value head. I have opened a new issue for it #330 . My observations were that T=0 even with a small number of playouts gives more accurate result than T=1, 800 playouts. So we can continue to gain benefits of T=1, by using T=1 to generate positions and using T=0, small number of playouts to get a more accurate estimate of the end result for each of the positions.

Mardak commented 6 years ago

@Tilps It looks like the maximum game length lc0 currently allows is 450 ply: https://github.com/LeelaChessZero/lc0/blob/3287af2fbdb10472ba1d696bf9a3653693fdb69f/src/chess/position.cc#L86

Although from CCCC Round 1, Lc0 vs Fire had 237 moves (and Crafty vs Lc0 had exactly 225 moves matching up with the lc0 hardcoded game result cutoff).

Assuming the 225 move limit is acceptable, then setting that for temp decay should be similarly acceptable? So move 1 for both white and black have temperature = 1 while move 100 has temperature = 0.55.

I suppose if one wants to be more zero, the client could just keep playing past 450 ply, and the server keeps track of the longest game length to compute a new temp decay moves target?

Mardak commented 6 years ago

Digging into the history a bit more, 450 ply came from lczero: https://github.com/glinscott/leela-chess/pull/74/files#diff-74b25cc6eefb809aef883bad8eef9f1bL157

The intent there was to get << 1% of training games to hit that limit, and sounds like currently we're at 0.27% of games stopping at move 225.

Mardak commented 6 years ago

Rerunning the numbers with 11089 and various visits for 5th rank:

 800: info string h6h5  (1366) N:     775 (+ 1) (P: 76.03%) (Q: -0.14266) (U: 0.09404) (Q+U: -0.04862) (V: -0.2423) 
1600: info string h6h5  (1366) N:    1562 (+ 1) (P: 76.03%) (Q: -0.13252) (U: 0.06609) (Q+U: -0.06644) (V: -0.2423) 
3200: info string h6h5  (1366) N:    3144 (+ 1) (P: 76.03%) (Q: -0.12357) (U: 0.04647) (Q+U: -0.07710) (V: -0.2423) 
6400: info string h6h5  (1366) N:    6314 (+ 0) (P: 76.03%) (Q: -0.12000) (U: 0.03274) (Q+U: -0.08725) (V: -0.2423) 

Those correspond to continuing check being played with T=1: 96.9%, 97.6%, 98.3%, 98.7%

And 4th rank:

 800: info string h5h4  (1124) N:     760 (+ 1) (P: 62.33%) (Q: -0.10649) (U: 0.07861) (Q+U: -0.02788) (V: -0.1271) 
1600: info string h5h4  (1124) N:    1538 (+ 1) (P: 62.33%) (Q: -0.10933) (U: 0.05503) (Q+U: -0.05431) (V: -0.1271) 
3200: info string h5h4  (1124) N:    3105 (+ 1) (P: 62.33%) (Q: -0.10975) (U: 0.03858) (Q+U: -0.07117) (V: -0.1271) 
6400: info string h5h4  (1124) N:    6261 (+ 1) (P: 62.33%) (Q: -0.10839) (U: 0.02707) (Q+U: -0.08132) (V: -0.1271) 

With respective 95.0%, 96.1%, 97.0%, 97.8% probabilities to continue perpetual.

And just calculating the probability of successful checking assuming a single averaged probability: 95% check -> 8% draw 96% check -> 13% draw 97% check -> 22% draw 98% check -> 36% draw 99% check -> 61% draw

So instead of adjusting temperature, just simply doubling visits should lead to significantly more games that correctly play to draw even with T=1 because continuing the perpetual is the only reasonable play, and search will gladly put more visits into it. (For reference in this position, the next best move after the check Q: -0.1 is -0.8.)

(Increasing visits improves value head while keeping existing temperature randomness, and increasing visits also improves policy head while keeping existing noise without needing #8.)

oscardssmith commented 6 years ago

doesn't this cut both ways? we will get less bad effects from temp, but we'll also explore less.

Mardak commented 6 years ago

Depends on the position, if there are multiple good moves, potentially there will be more exploration and less bias towards prior. For example same 11089 network but from startpos:

800 visits
info string g2g3  (374 ) N:      52 (+ 0) (P:  5.89%) (Q:  0.03628) (U: 0.10675) (Q+U:  0.14303) (V:  0.0393) 
info string c2c4  (264 ) N:      57 (+ 0) (P:  6.79%) (Q:  0.03085) (U: 0.11251) (Q+U:  0.14336) (V:  0.0450) 
info string d2d4  (293 ) N:     132 (+ 0) (P: 16.97%) (Q:  0.02186) (U: 0.12261) (Q+U:  0.14447) (V:  0.0494) 
info string e2e4  (322 ) N:     153 (+ 1) (P: 16.61%) (Q:  0.04107) (U: 0.10297) (Q+U:  0.14404) (V:  0.0514) 
info string g1f3  (159 ) N:     314 (+ 0) (P: 35.38%) (Q:  0.03675) (U: 0.10793) (Q+U:  0.14468) (V:  0.0500) 

1600 visits
info string g2g3  (374 ) N:     106 (+ 0) (P:  5.89%) (Q:  0.03749) (U: 0.07480) (Q+U:  0.11229) (V:  0.0393) 
info string c2c4  (264 ) N:     115 (+ 0) (P:  6.79%) (Q:  0.03254) (U: 0.07958) (Q+U:  0.11213) (V:  0.0450) 
info string d2d4  (293 ) N:     252 (+ 0) (P: 16.97%) (Q:  0.02093) (U: 0.09118) (Q+U:  0.11211) (V:  0.0494) 
info string e2e4  (322 ) N:     343 (+ 0) (P: 16.61%) (Q:  0.04681) (U: 0.06564) (Q+U:  0.11245) (V:  0.0514) 
info string g1f3  (159 ) N:     605 (+ 1) (P: 35.38%) (Q:  0.03307) (U: 0.07924) (Q+U:  0.11231) (V:  0.0500) 

3200 visits
info string g2g3  (374 ) N:     200 (+ 0) (P:  5.89%) (Q:  0.03522) (U: 0.05632) (Q+U:  0.09154) (V:  0.0393) 
info string c2c4  (264 ) N:     250 (+ 0) (P:  6.79%) (Q:  0.03941) (U: 0.05202) (Q+U:  0.09143) (V:  0.0450) 
info string d2d4  (293 ) N:     441 (+ 0) (P: 16.97%) (Q:  0.01766) (U: 0.07382) (Q+U:  0.09149) (V:  0.0494) 
info string e2e4  (322 ) N:     791 (+ 1) (P: 16.61%) (Q:  0.05135) (U: 0.04027) (Q+U:  0.09163) (V:  0.0514) 
info string g1f3  (159 ) N:    1188 (+ 0) (P: 35.38%) (Q:  0.03431) (U: 0.05722) (Q+U:  0.09153) (V:  0.0500) 

6400 visits
info string g2g3  (374 ) N:     425 (+ 0) (P:  5.89%) (Q:  0.03778) (U: 0.03758) (Q+U:  0.07536) (V:  0.0393) 
info string c2c4  (264 ) N:     441 (+ 0) (P:  6.79%) (Q:  0.03369) (U: 0.04178) (Q+U:  0.07547) (V:  0.0450) 
info string d2d4  (293 ) N:     794 (+ 1) (P: 16.97%) (Q:  0.01755) (U: 0.05798) (Q+U:  0.07552) (V:  0.0494) 
info string e2e4  (322 ) N:    1868 (+ 0) (P: 16.61%) (Q:  0.05125) (U: 0.02417) (Q+U:  0.07542) (V:  0.0514) 
info string g1f3  (159 ) N:    2271 (+ 0) (P: 35.38%) (Q:  0.03310) (U: 0.04235) (Q+U:  0.07545) (V:  0.0500) 

Notice how each doubling of visits increases the highest prior move's visits by less than double, i.e., other moves are more likely to be picked by T=1, so more diversity / exploration here.

oscardssmith commented 6 years ago

oh, that's really interesting. to me this suggests that training on more nodes might change the preferred opening to e4.

Mardak commented 5 years ago

With the conclusion that t53's 0 endgame temperature was weaker, I started looking at this position again from the original comment to see what temperature settings would lead to the most position variety while not blundering the draw: ResetToPosition("6k1/8/8/8/1pP4R/pPr5/P2K3R/r7 w - - 5 51"

Curiously, I noticed 53316 frequently blundering black by walking the king down to rank 3, and it's because it doesn't think it's that bad for black!

g4g3 N:      63 P: 10.52% Q: -0.06281  
g4f3 N:     100 P:  8.88% Q:  0.02286  
g4f5 N:     476 P: 35.64% Q:  0.02215  
g4g5 N:     520 P: 44.96% Q:  0.01066  

Screen Shot 2019-06-15 at 11 11 08 AM

Turns out the network would happily keep on checking even though there's an opportunity to exchange rooks to free the king. In this case, it's another example of the network knowing the move would be good if only the prior didn't hinder search, so the #8 out-of-order nature of visiting root children first would have allowed 53316 to direct search towards the capture:

position fen 6k1/8/8/8/1pP4R/pPr5/P2K3R/r7 w - - 5 51 moves h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5f6 h5h6 f6f5 h6h5 f5g4 h5h4 g4g5 h4h5 g5g4 h5h4 g4f3 h4h3 f3e4

11258 h3c3 P: 13.92% V:  0.3189
22202 h3c3 P: 27.34% V:  0.3329
32930 h3c3 P:  1.24% V:  0.2136
42500 h3c3 P:  3.36% V:  0.2230
50783 h3c3 P:  5.55% V:  0.0885
51458 h3c3 P:  3.07% V:  0.1434
52377 h3c3 P:  3.76% V:  0.1241
53316 h3c3 P:  0.63% V:  0.2598

@Tilps to be clear, fpu reduction of 0 at root would find h3c3 as well. Looks like t53 didn't learn this exchange tactic because it wouldn't have gotten itself into this position in the first place, and generally this is one example of not blundering preventing selfplay from generating valuable learning opportunities, e.g., uncovering these moves or not forgetting that a nearby position is indeed bad.

Here's some analysis running from the original position above with various temperatures/offsets with high plain resign percentage so that draws will play out to 3-fold while blunders for either side end quickly:

0 temperature
selfplay --temperature=0 -w 53316 --visits=800 --fpu-strategy=reduction --fpu-value=0.5 --fpu-strategy-at-root=reduction --fpu-value-at-root=0 --games=1000 --resign-percentage=40
W: +0 -0 =500 
B: +0 -0 =500
unique games: 664

most common draw games:
  16 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4f5 h4h5 f5e4
  19 h4h8 g8f7 h2h7 f7e6 h7h6 e6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4e5 h4h5 e5e4
  30 h4h8 g8g7 h2h7 g7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4f5 h4h5 f5f4 h5h4 f4e5 h4h5 e5f4 h5h4 f4e5 h4h5
  31 h4h8 g8f7 h2h7 f7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4f5 h4h5 f5f4 h5h4 f4e5 h4h5 e5f4 h5h4 f4e5 h4h5
  46 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4e5 h4h5 e5e4

most common whitewon games:

most common blackwon games:

.45 temperature
selfplay --temperature=.45 -w 53316 --visits=800 --fpu-strategy=reduction --fpu-value=0.5 --fpu-strategy-at-root=reduction --fpu-value-at-root=0 --games=1000 --resign-percentage=40
W: +38 -4 =458 
B: +4 -41 =455
unique games: 940

most common draw games:
   3 h4h8 g8g7 h2h7 g7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4f5 h4h5 f5f4 h5h4 f4e5 h4h5 e5f4 h5h4 f4e5 h4h5
   4 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4e5 h4h5 e5e4
   4 h4h8 g8g7 h2h7 g7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4e5 h4h5
   5 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
   7 h4h8 g8g7 h2h7 g7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4f5 h4h5 f5e4 h5h4 e4f5 h4h5

most common whitewon games:
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6f5 h2h5 f5f4 h5h4 f4g5 h4h5 g5f4 h5h4 f4g3 h4h3
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6f5 h6h5 f5f6 h5h6 f6e5 h6h5 e5e4 h5h4 e4e5 h4h5 e5f6 h5h6 f6f7 h6h7 f7g6 h7h6 g6f5 h6h5 f5g4 h2h4 g4f3 h4h3 f3e4 h3c3
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6f7 h6h7 f7e6 h7h6 e6f5 h2h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4f3 h4h3 f3e4 h3c3
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6f7 h6h7 f7e6 h7h6 e6f5 h6h5 f5e4 h2h4 e4f3 h4h3 f3e4 h3c3
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6f7 h6h7 f7f6 h7h6 f6e7 h6h7 e7d6 h7h6 d6e5 h6h5 e5f4 h2h4 f4g3 h4h3 g3g4 h3h4 g4f3 h4h3 f3e4 h3c3

most common blackwon games:
   1 h4h8 g8f7 h2h7 f7g6 h7h6 g6g5 h6h5 g5f6 h5h6 f6g7 h6h7 g7g6 h7h6 g6g7 h6h7 g7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4f5 h4h5 f5f4 h5h4 f4g5 h4h5 g5f4 h5h4 f4f5 h4h5 f5g6 h8h7 a1a2
   1 h4h8 g8f7 h2h7 f7g6 h7h6 g6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8f7 h6h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h8e8 e5f5
   1 h4h8 g8g7 h2h7 g7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4f5 h4h5 f5e4 h5h4 e4e5 h8e8 e5f5
   1 h4h8 g8g7 h2h7 g7f6 h7h6 f6e5 h6h5 e5f6 h5h6 f6e5 h8e8 e5f5
   1 h4h8 g8g7 h2h7 g7g6 h7h6 g6f5 h6h5 f5g6 h5h6 g6f7 h8h7 f7e8 h6a6 a1a2 d2d1 c3d3 d1e1 a2d2 a6a7 d3d7 h7d7 d2d7 a7a8 e8e7 e1e2 e7e6

1 temperature
selfplay --temperature=1 -w 53316 --visits=800 --fpu-strategy=reduction --fpu-value=0.5 --fpu-strategy-at-root=reduction --fpu-value-at-root=0 --games=1000 --resign-percentage=40
W: +72 -270 =158 
B: +276 -59 =165
unique games: 922

most common draw games:
   2 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8f7 h8h7 f7g8
   2 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
   2 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5g6 h8h6 g6g7 h6h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7 h8h7
   2 h4h8 g8g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8
   3 h4h8 g8g7 h2h7 g7g6 h7h6 g6f7 h8h7 f7g8 h7h8 g8f7 h8h7 f7g8 h7h8 g8f7

most common whitewon games:
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6g7 h6h7 g7f8 h7h8 f8f7 h8h7 f7f6 h7h6 f6f5 h6h5 f5e4 h2h4 e4f3 h4h3 f3e4 h3c3
   1 h4h8 g8g7 h8h7 g7g8 h7h8 g8f7 h8h6 f7g7 h6h7 g7f6 h7h6 f6f5 h6h5 f5g6 h5h6 g6f5 h6h5 f5g4 h2h4 g4f3 h4h3 f3e4 h3c3
   1 h4h8 g8g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g6 h2h6 g6f5 h6h5 f5f4 h5h4 f4e5 h4h5 e5f4 h5h4 f4f3 h4h3 f3e4 h3c3
   2 h4h8 g8f7 h2h7 f7g6 h7h6 g6g5 h6h5 g5f4 h5h4 f4g3 h4h3
   2 h4h8 g8g7 h8h7 g7f6 h7h6 f6f5 h2h5 f5g4 h5h4 g4g3 h4h3

most common blackwon games:
   6 h2f2 a1a2
   6 h2h1 a1a2
   7 h4h8 g8f7 h2h7 f7f6 h8f8 f6g6
   8 h2g2 g8f7 g2f2 f7g6
   9 h4h8 g8f7 h2f2 f7g7

.5 temperature, offset -5
selfplay --temperature=.5 --temp-visit-offset=-5 -w 53316 --visits=800 --fpu-strategy=reduction --fpu-value=0.5 --fpu-strategy-at-root=reduction --fpu-value-at-root=0 --games=1000 --resign-percentage=40
W: +47 -3 =450 
B: +2 -43 =455
unique games: 945

most common draw games:
   3 h4h8 g8g7 h2h7 g7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4f5 h4h5 f5f4 h5h4 f4e5 h4h5 e5f4 h5h4 f4e5 h4h5
   5 h4h8 g8f7 h2h7 f7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4e5 h4h5
   5 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4f5 h4h5 f5e4
   5 h4h8 g8f7 h2h7 f7g6 h7h6 g6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
  11 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7

most common whitewon games:
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6f5 h6h5 f5f6 h5h6 f6f5 h6h5 f5g6 h5h6 g6g5 h6h5 g5g4 h2h4 g4f3 h4h3 f3e4 h3c3
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6g5 h2h5 g5g4 h5h4 g4f5 h4h5 f5f4 h5h4 f4f3 h4h3
   2 h4h8 g8f7 h2h7 f7f6 h7h6 f6e5 h6h5 e5e6 h5h6 e6f5 h6h5 f5e4 h5h4 e4f3 h4h3 f3e4 h3c3
   2 h4h8 g8g7 h2h7 g7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4f3 h4h3
   3 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4f3 h4h3 f3e4 h3c3

most common blackwon games:
   1 h2g2 g8f7 g2f2 f7g6
   1 h4h8 g8f7 h2h7 f7f6 h7h6 f6e5 h8e8 e5f5
   1 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5d6 c4c5 d6c6 h8h6 c6b5 h6b6 b5a5 b6b4 a5b4 h5h4 b4c5 d2c3 a1a2 h4a4 c5b6 c3b4 b6b7 b4c5 b7b8 c5b4 b8b7 b4c3 a2g2 c3d4 g2g4 d4c3 g4g8 c3d3 a3a2 d3c3 g8c8 c3b2 c8f8 b2c1 f8f2 a4a3 b7b6
   1 h4h8 g8g7 h2h7 g7f6 h7h6 f6f5 h6h5 f5e4 h8e8 e4f4
   1 h4h8 g8g7 h8h7 g7f8 h7h8 f8e7 h2h7 e7d6 c4c5 d6c5 h7h5 c5d6 h5h6 d6d5 h8d8 d5e5

1 temperature, -50 offset
selfplay --temperature=1 --temp-visit-offset=-50 -w 53316 --visits=800 --fpu-strategy=reduction --fpu-value=0.5 --fpu-strategy-at-root=reduction --fpu-value-at-root=0 --games=1000 --resign-percentage=40
W: +64 -1 =435 
B: +5 -48 =447
unique games: 984

most common draw games:
   2 h4h8 g8g7 h2h7 g7f6 h7h6 f6f5 h6h5 f5e4 h5h4 e4f5 h4h5 f5e4 h5h4 e4f5 h4h5
   2 h4h8 g8g7 h2h7 g7g6 h7h6 g6f7 h8h7 f7g8 h7h8 g8f7 h8h7 f7g8 h7h8 g8f7
   2 h4h8 g8g7 h8h7 g7g6 h7h6 g6f5 h2h5 f5f4 h5h4 f4g5 h4h5 g5f4 h5h4 f4g5 h4h5 g5f4
   3 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
   3 h4h8 g8f7 h2h7 f7g6 h7h6 g6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7

most common whitewon games:
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6g5 h6h5 g5f6 h5h6 f6e5 h6h5 e5e6 h5h6 e6f5 h6h5 f5f4 h5h4 f4g5 h4h5 g5g4 h5h4 g4f3 h2h3
   1 h4h8 g8g7 h8h7 g7g6 h7h6 g6g7 h6h7 g7g8 h7h8 g8f7 h2h7 f7f6 h7h6 f6e5 h6h5 e5f6 h5h6 f6e7 h6h7 e7e6 h7h6 e6f5 h6h5 f5f4 h5h4 f4f5 h4h5 f5f4 h5h4 f4f3 h4h3 f3e4 h3c3
   1 h4h8 g8g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7f6 h2h6 f6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5d4 h5d5
   2 h4h8 g8f7 h2h7 f7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4f3 h4h3
   2 h4h8 g8f7 h8h7 f7e6 h7h6 e6e5 h2h5 e5e4 h5h4 e4f3 h4h3

most common blackwon games:
   1 h2g2 g8f7 h4h7 f7f6
   1 h2g2 g8f8 h4f4 f8e8 f4e4 e8d8 e4d4 d8e7 d4e4 e7f6
   1 h4h8 g8f7 h2h7 f7e6 h7h6 e6f5 h6h5 f5e4 h5h4 e4e5 h8h5 e5e6 h5h6 e6f7 h6h7 f7f8 h7h8 f8e7 h4h7 e7f6 h7h6 f6g7 h8h7 g7f8 h7h8 f8f7 h6h7 f7e6 h7h6 e6f5 h6h5 f5g6 h5h6 g6g7 h6h7 g7g6 h7h6 g6g5 h6h5 g5g4 h5h4 g4f5 h8h5 f5f6 h5h6 f6e7 h6h7 e7d8 h7h8 d8c7 h4h7 c7b6 h7h6 b6c5 h6h5 c5c6 h5h6 c6c5 h6h5 c5d6 h8h6 d6c7 h5c5 c7d8 c5d5 d8e8
   1 h4h8 g8f7 h2h7 f7g6 h7h6 g6g7 h8h7 g7g8 h7h8 g8f7 h6h7 f7e6 h7h6 e6f5 h6h5 f5g6 h5h6 g6f5 h6h5 f5f4 h5h4 f4g5 h4h5 g5f4 h5h4 f4e5 h4h5 e5d6 h5d5 d6e6 h8h6 e6f7 d5d7 f7g8 h6g6 g8f8 g6a6 a1a2 d2d1 c3e3 a6h6 f8g8 h6g6 g8f8 g6h6 f8e8 h6h7 e3g3
   1 h4h8 g8g7 h8h7 g7f6 h7h6 f6f7 h6h7 f7f8 h7h8 f8e7 h2h7 e7d6 c4c5 d6d5 h7h5 d5c6 h8h6 c6d7 h6d6 d7c7 h5h7 c7c8 d6c6 c8d8 c6g6 a1a2 d2d1 c3d3 d1c1 d3d7 h7d7 d8d7 g6b6 a2a1
Mardak commented 5 years ago

Here's an analysis similar to @Ttl's https://github.com/LeelaChessZero/lc0/issues/710#issuecomment-459074662 starting from a position and generating selfplay games:

Screen Shot 2019-06-16 at 3 23 29 PM

Might just be the noise, but .6 temperature with more negative offset seems to increase uniques as well as correct outcomes. Although the same happens for 1 temperature when going from 0 offset to -50 offset: 922 -> 984 uniques and 323 -> 882 correct. This might also be related to how I adjudicate the games soon after a blunder, so reducing blunders with more accurate outcomes and high temperature allows more variety of draws.

Here's the most common drawn game and how many for each:

   0 : 46 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4e5 h4h5 e5e4
  .1 : 28 h4h8 g8f7 h2h7 f7f6 h7h6 f6e5 h6h5 e5e4 h5h4 e4f5 h4h5 f5f4 h5h4 f4e5 h4h5 e5f4 h5h4 f4e5 h4h5
  .2 : 24 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4e5 h4h5 e5e4
  .3 : 13 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4e5 h4h5 e5e4 h5h4 e4e5 h4h5 e5e4
  .4 : 11 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4f5 h4h5 f5e4 h5h4 e4f5 h4h5
  .45:  7 h4h8 g8g7 h2h7 g7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4f5 h4h5 f5e4 h5h4 e4f5 h4h5
  .5 :  9 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
  .6 :  7 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
  .7 :  6 h4h8 g8f7 h2h7 f7g6 h7h6 g6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
  .8 :  4 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
  .9 :  3 h4h8 g8f7 h8h7 f7f8 h7h8 f8g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8
  1  :  3 h4h8 g8g7 h2h7 g7g6 h7h6 g6f7 h8h7 f7g8 h7h8 g8f7 h8h7 f7g8 h7h8 g8f7

.45-5: 10 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
.5 -5: 11 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
.6 -5:  4 h4h8 g8f7 h2h7 f7g6 h7h6 g6f5 h6h5 f5e4 h5h4 e4f5 h4h5 f5e4 h5h4 e4f5 h4h5
1 -50:  3 h4h8 g8f7 h2h7 f7g6 h7h6 g6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7

.4 -1:  6 h4h8 g8g7 h2h7 g7f6 h7h6 f6e5 h6h5 e5f6 h5h6 f6e5 h6h5 e5f6 h5h6
.45-1:  8 h4h8 g8f7 h2h7 f7g6 h7h6 g6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
.5 -1: 10 h4h8 g8f7 h2h7 f7g6 h7h6 g6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
.6 -1:  7 h4h8 g8f7 h2h7 f7f6 h7h6 f6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
.7 -1:  4 h4h8 g8f7 h2h7 f7g6 h7h6 g6g7 h8h7 g7g8 h7h8 g8g7 h8h7 g7g8 h7h8 g8g7
Mardak commented 4 years ago

With https://github.com/LeelaChessZero/lc0/pull/964, looks like at least the original position here with 59350 would split out all the bad moves that fail to draw. Here's search with 800 visits without noise as well as for some nearby positions: Screen Shot 2020-01-01 at 4 50 24 PM

position fen 6k1/8/8/8/1pP4R/pPr5/P2K3R/r7 w - - 5 51
nodes 481 score cp  -19 multipv 1 pv h4h8 g8f7 h2h7 f7f6 h7h6 f6f5 h6h5 f5g6 h5h6 g6g5 h6h5 g5g6
nodes 282 score cp  -18 multipv 2 pv h2g2 g8f7 g2f2 f7e6 h4h6 e6e5 h6h5 e5e6 h5h6
nodes 132 score cp  -38 multipv 3 pv h4g4 g8f7 h2f2 f7e6 g4e4 e6d6 e4d4 d6c6 f2f6
nodes   5 score cp -192 multipv 4 pv h2h1 a1a2 d2d1 a2a1 d1d2

position fen 6k1/8/8/8/1pP4R/pPr5/P2K3R/r7 w - - 5 51 moves h4h8 g8g7
nodes 429 score cp  -12 multipv 1 pv h8h7 g7f6 h7h6 f6e7 h6h7 e7d6 h2h6 d6e5 h7e7 e5f5
nodes 386 score cp  -15 multipv 2 pv h2h7 g7f6 h7h6 f6f5 h6h5 f5g6 h5h6 g6g5 h6h5 g5g6 h5h6 g6g5 h8g8
nodes   4 score cp -137 multipv 3 pv h8h3 a1a2 d2d1 a2h2

position fen 6k1/8/8/8/1pP4R/pPr5/P2K3R/r7 w - - 5 51 moves h4h8 g8g7 h8h7 g7g6
nodes 430 score cp   -6 multipv 1 pv h7h6 g6g7 h6h7 g7g8 h7h8 g8g7 h8h7
nodes 363 score cp   -5 multipv 2 pv h2h6 g6f5 h6h5 f5g6 h5h6 g6g5 h6h5 g5g6 h7h6 g6g7 h6h7 g7g6
nodes   4 score cp -210 multipv 3 pv h7h3 a1a2 d2d1 a2h2

position fen 6k1/8/8/8/1pP4R/pPr5/P2K3R/r7 w - - 5 51 moves h4h8 g8g7 h2h7 g7g6
nodes 787 score cp   -9 multipv 1 pv h7h6 g6f5 h6h5 f5f6 h5h6 f6g5 h6h5 g5f6 h5h6 f6g5 h6h5
nodes   5 score cp  -89 multipv 2 pv h7h3 c3h3 h8h3 a1a2 d2d3 a2b2

This means -minimum-allowed-visits=5 would split and prevent the bad moves from affecting the eval of the earlier positions.

Checking with --minimum-kldgain-per-node=0.000010 has similar visit distribution with 888 nodes, so a slightly higher minimum visits would filter out the moves as well:

nodes 341 score cp  -10 multipv 1 pv h2g2 g8f7 g2f2 f7e6 h4h6 e6e5 f2e2 e5f5 h6h5
nodes 277 score cp  -14 multipv 2 pv h4h8 g8f7 h2h7 f7f6 h7h6 f6f5 h6h5 f5g6 h5h6 g6g5
nodes 226 score cp  -18 multipv 3 pv h4g4 g8f7 g4f4 f7e6 h2e2 e6d6 f4f6 d6d7 f6f7
nodes   7 score cp -160 multipv 4 pv c4c5 a1a2 d2d1 a2a1 d1d2
nodes   6 score cp -115 multipv 5 pv h2h3 a1a2 d2d1 c3h3 h4h3

Checking with noise added does sometimes result in a bad move getting bumped up to more visits roughly 20% of the time, e.g.,:

h2h1 N:      11 P: 10.95% Q: -0.74177 V:  0.1197
h2e2 N:      18 P: 17.27% Q: -0.82896 V: -0.1405
h2h3 N:      24 P: 11.05% Q: -0.52962 V:  0.1123
h4h7 N:      12 P:  8.31% Q: -0.66872 V: -0.2305

So overall, seems like badgame split should be able to help with these types of endgames by reducing the effect and likelihood of playing into bad moves.