Closed YouriAndropov closed 4 years ago
Could it be a MCTS side effect ?
Yes, exactly. The search is based on Q differences, which become arbitrarily small when the game is nearly decided, and is usually made more efficient by the policies favoring good moves, which suffer from the same problem. There is right now work on several possible solutions, see e.g. #961
Couldn't you switch to a regular alpha-beta search when the game is nearly over ?
Leela is trained to win, there's no incentive to try to do that faster. So as long as this doesn't lead to dropping the win, that's working as intended.
We could try to give fast wins more score during training, but DeepMind said they tried it and while that reduced game duration, it reduced overall strength.
The similar issue exists in Go. People generally try to win by higher margin, but as it's a win regardless of the score, AlphaZero Go (and Leela Zero) usually wins just by one point. Some people may find that unaesthetic, but win is a win.
In regular blitz games with no additional time, this causes lc0 to loose games with a winning position. It isn't just "aesthetic", lc0 wastes time playing useless moves.
Then we have to think how to properly model the probability of losing the game during the training.
Currently the ability to loss due to taking too many moves is not encoded in Leela's understanding of chess rules in any way. it's tricky to add because training games are fixed nodes, and there's no time budgeting going on, and it's hard to address that.
But that will take some of network capacity and likely will reduce overall strength.
TBH I don't really care about the problem in the zero-increment case, as every engine becomes terrible / times out in the limit for thinking time going to zero. However, a remedy for the underlying problem of unnecessarily prolonging the game is useful for all time controls (because of the higher implied average thinking time per move).
As @Naphthalin already mentioned, the most notable already investigated MCTS enhancement in this regard is #961 by @Ttl, which reduces the expected number of game plies considerably (70 plies in the given test).
Ok I'm eager to see this improvement.
every engine becomes terrible / times out in the limit for thinking time going to zero
Nevertheless, MCTS engines seems much more sensitive to "zeitnot" than alpha-beta engines.
I did a few tests with 0.23.3 and yet I couldn't see any progress. Should I use a specific network with it ? Has the whole "moves left head" feature been merged ?
Ex: lc0 plays white, timecontrol 300+2. At ply 127, blacks are to be checkmated in 18 plies. At ply 150, lc0 promotes a queen, its a KKQ ending, the other (αβ) engine can still be checkmated in 8 plies, but checkmate is done at ply 199. `1. e4 c5 2. Ne2 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. Be2 e6 7. Be3 Be7 8. f4 {+0.60/11 8} O-O {-0.15/22 33} 9. g4 {+0.68/13 5} d5 {-0.02/23 17} 10. e5 {+0.63/14 6} Nfd7 {+0.02/23 4} 11. Qd2 {+0.64/12 15} Bh4+ {0.00/23 11}
Hi Youri..are you also on Lc0 discord..if yes ,with what name..else you can contact me at ipmanchess
I did a few tests with 0.23.3 and yet I couldn't see any progress.
The bug fix was only for a specific time management bug where the nps calculation started at the wrong time.
Should I use a specific network with it ? Has the whole "moves left head" feature been merged ?
Not merged yet (as you see in the #961 PR), and the only net I am aware of is a net which got its MLH trained afterwards, so it didn't influence the policies directly.
When there's a testable binary available, please let me know (I use the cuda fp32 backend), if it helps.
@YouriAndropov we have released v0.25.1, please try it together with a suitable network like 712576 from http://training.lczero.org/networks/?show_all=1 and the moves-left-head settings for run 3 on http://training.lczero.org/training_runs
Let's move all the shuffling discussion to #1229 for now.
Closing this as duplicate. Feel free to reopen if you think it's different.
It seems lc0 plays very bad when it comes to chess endings. Even with additional time per move in blitz, 2 or 3 seconds does not seem enough to find obvious checkmate. Eventually it finds a way, but after a big number of useless moves. Could it be a MCTS side effect ?
I'm using a 128x10 and a 256x20 network on a nvidia GTX 1080 Ti hardware.
Ex: (lc0 plays white, timecontrol 300+2)
`1. d4 Nf6 2. c4 e6 3. Nf3 Be7 4. g3 d5 5. Bg2 O-O 6. O-O dxc4 7. Qc2 a6 8. a4 Bd7 9. Qxc4 {+0.34/10 5} Bc6 {-0.12/22 12} 10. Bf4 {+0.32/10 9} Nd5 {0.00/23 14} 11. Bd2 {+0.45/10 5} Nd7 {0.00/23 19} 12. Re1 {+0.50/11 13} a5 {0.00/23 10} 13. e4 {+0.58/12 4} Nb4 {-0.15/23 13} 14. Qb3 {+0.58/12 5} b6 {-0.18/24 20} 15. Na3 {+0.58/12 6} Bb7 {-0.11/21 10} 16. Rad1 {+0.59/12 6} h6 {-0.19/22 31} 17. h4 {+0.71/12 19} Kh8 {-0.07/22 25} 18. Bc3 {+0.75/10 16} Kg8 {-0.07/21 10} 19. Rd2 {+0.66/11 5} Rc8 {-0.06/21 3} 20. Nc4 {+0.90/11 8} Qe8 {-0.20/20 1} 21. Qd1 {+0.90/12 8} Na2 {-0.07/24 15} 22. Qb1 {+0.89/12 6} Nxc3 {-0.07/23 4} 23. bxc3 {+0.88/12 0} Nf6 {-0.12/23 10}