glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
758 stars 298 forks source link

A faster winning condition #429

Open AntonioDiCecco opened 6 years ago

AntonioDiCecco commented 6 years ago

To train faster I would propose a pseudo winning condition Instead of checkmate.

An old idea of Lasker was that the game was lost with a five pawns difference. Of course a similar idea should have a new form in lc0 training.

My propose is to "resign" when in training one ID thinks to have less of 20% chances of winning and the other more than 80%

mooskagh commented 6 years ago

With 20% chance of winning, it will win 1 time out of 5 (by definition). Being 5 pawns down is much less than 20%, and there are discussions to limit it at 5%. There is a worry that that way engine will forget how to win won positions though.

On Tue, Apr 24, 2018 at 10:35 AM Sabaain notifications@github.com wrote:

To train faster I would propose a pseudo winning condition Instead of checkmate.

An old idea of Laker was that the game was lost with a five pawns difference. Of course a similar idea should have a new form in lc0 training.

My propose is to "resign" when in training one ID thinks to have less of 20% chances of winning and the other more than 80%

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/429, or mute the thread https://github.com/notifications/unsubscribe-auth/AKvpl3K9MJebXBbCLPR2JpI9tmIw32IOks5truPUgaJpZM4ThO7H .

AntonioDiCecco commented 6 years ago

My full Idea is using the 80% vs 20% (or 95% 5%) as a start and reducing it gradually to 100% vs 0% like in simulated annealing

benkoshy commented 6 years ago

Quite an interesting idea. It would be interesting to test the proxy (i.e. psuedo winning conditions - whatever that may be) vs. normal computation and calculate differences. That way, you can place a cost on computations - and if you can price it fairly accurately, vary the probabilities and see the results perhaps you can improve speed by an order of magnitude.

killerducky commented 6 years ago

See also #418.

Ishinoshita commented 6 years ago

No need of human handcrafted (boo...) "pseudo winning condition". LCZ has already the capability to resign self-play games with the resign threshold feature inherited from LZ code (and from AlphagoZero). Resign based on win rate dropping below some threshold , threshold dynamically adjusted so that only 5% of game are unduly resigned). What is not clear to me is whether that feature is currently enabled in LCZ.

ghost commented 6 years ago

@Ishinoshita It is not. There has been some discussion of implementing the resign feature in training games in the next version.