glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
760 stars 299 forks source link

When should we add resign? #112

Closed jjoshua2 closed 5 years ago

jjoshua2 commented 6 years ago

Pretty sure we have no resign, but there probably is support from leela zero? Probably we should start with .01 and then raise it to .10 once we have a GM level network or TB adjudication. It might not be too early to investigate having some games resign with very low threshold.

CMCanavessi commented 6 years ago

Isn't resign an xboard/winboard protocol only feature? There's no resign in UCI afaik.

jkiliani commented 6 years ago

I think it's still too early for this. When https://github.com/glinscott/leela-chess/issues/92 shows that the network is capable of safely converting all won basic endgames (K+Q vs K, K+R vs K and maybe K+B+B vs K), then it may make sense to allow resignation.

jjoshua2 commented 6 years ago

@CMCanavessi it probably would need to be a UCI toggle to be able to disable for use in matches/GUIs, but for self play we can do whatever speeds up generation. @jkiliani if we had resignation in just half of games it would still keep on learning from the other half. Finding use of material and checkmates in midgame will also help improve endgame, so we have no evidence that it would slow things down instead of speed up since generation is increased.

jkiliani commented 6 years ago

Difficult choice... maybe, but there would have to be a continuous analysis accompanying such a step, about the rate of wrong resignations. A wrong resign ending in a draw should be weighted half a wrong resign ending in the other player winning.

mooskagh commented 6 years ago

I guess the scores after every move are sent to that server, so it seems that we can get that stats now. (Plot probability of win/lose/draw finally for different buckets of scores). But my opinion is that now it's too early to add resignation now, the network still has problems winning clearly won games.

On Wed, Mar 14, 2018 at 12:50 PM jkiliani notifications@github.com wrote:

Difficult choice... maybe, but there would have to be a continuous analysis accompanying such a step, about the rate of wrong resignations. A wrong resign ending in a draw should be weighted half a wrong resign ending in the other player winning.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/112#issuecomment-372994465, or mute the thread https://github.com/notifications/unsubscribe-auth/AKvplziEvYpAyDHxfwjYT_1mZACCZko7ks5teQP8gaJpZM4SqRT0 .

Dorus commented 6 years ago

The choice for resign is very easy if you follow AZ. I dont have the paper at hand, but they added resign on 80% of the games. The resign threshold was set so that at most 5% resignations is wrong.

There is no need to worry about too early resigns, as long as the network can accurately predict the game is over, you're good. Btw, with the current poor endgame, many of those (K+Q vs K, K+R vs K , etc) games will result in a draw anyway, so the network will never think it has to resign in those positions. Only once the network learns to play reliable endgame, it will begin to show resigns in those positions.

What you do need in order to pick the right resign threshold, is resign analyses. This means you need to know the lowest win% the winner had during a game with no resign. As long as you output that information during training, you could automatically decide on the design threshold. For LeelaZero, this step is not automated because the training data does not contain the expected win %'s.

mooskagh commented 6 years ago

There seems to be nothing about resignation during selfplay in alphazero paper, but in alphago zero https://www.nature.com/articles/nature24270.epdf?author_access_token=VJXbVjaSHxFoctQQ4p2k4tRgN0jAjWel9jnR3ZoTv0PVW4gB86EEpGqTRDtpIz-2rmo8-KG06gqVobU5NSCFeHILHcVFUeMsbvwS-lxjqQGg98faovwjxeTUgZAUMnRQ paper they indeed had 5% resignation threshold. I didn't find that it resulted in 80% of games ended with resignation though.

It surely sounds like a good optimization to add then.

On Wed, Mar 14, 2018 at 12:58 PM Dorus notifications@github.com wrote:

The choice for resign is very easy if you follow AZ. I dont have the paper at hand, but they added resign on 80% of the games. The resign threshold was set so that at most 5% resignations is wrong.

There is no need to worry about too early resigns, as long as the network can accurately predict the game is over, you're good. Btw, with the current poor endgame, many of those (K+Q vs K, K+R vs K , etc) games will result in a draw anyway, so the network will never think it has to resign in those positions. Only once the network learns to play reliable endgame, it will begin to show resigns in those positions.

What you do need in order to pick the right resign threshold, is resign analyses. This means you need to know the lowest win% the winner had during a game with no resign. As long as you output that information during training, you could automatically decide on the design threshold. For LeelaZero, this step is not automated because the training data does not contain the expected win %'s.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/112#issuecomment-372996364, or mute the thread https://github.com/notifications/unsubscribe-auth/AKvpl_mf_QWoB4ftRmzNaP7XHRwuzMkhks5teQXdgaJpZM4SqRT0 .

jjoshua2 commented 6 years ago

Dorus has some good points. There also is the point that if a network resigns "wrongly", there is a good chance it would have played wrongly if it had played out, so you still saved time. Playing out doesn't magically fix bad weights, it just takes extra time. The fact that they only use 800 playouts shows how important high speed of generation is important over quality. More games takes out the noise. Big Data FTW

jkiliani commented 6 years ago

As long as a resign analysis as in https://github.com/gcp/leela-zero/issues/971 is done, there's nothing wrong with allowing resignations, but the wrongful resign percentage should not exceed 5%.

jjoshua2 commented 6 years ago

I can also imagine a similar early draw adjudication, but that seems it would require a new neural network output with confidence level since a 50% win rate is not sufficient for a draw, since the game starts out close to that. We could try if level is exactly 50% for X moves in a row, like maybe 4 ply to save two ply on a 3 rep, or even 25 ply to save 75 ply on a 50 move rule.

killerducky commented 6 years ago

We can add it as soon as someone writes the code to monitor the false resign rate. I volunteer @jjoshua2 :)

Dorus commented 6 years ago

There seems to be nothing about resignation during selfplay in alphazero paper, but in alphago zero

You are right. I double checked. Only time AZ mentioned resign was in match games vs stockfish. Possibly chess games are so short usually that there isn't much to gain from resignations? Not to mention many games can still end in a draw also, making resign a lot more meaningless than in go where every game has a winner and it can go on forever (they had to cut it off somewhere)

Error323 commented 6 years ago

The solution DeepMind provided is very elegant I think. However, as @killerducky implies, we're short on programmers and there are more pressing issues. So unless someone volunteers...

jkiliani commented 6 years ago

Right now without resignation, I've noticed that white vs black wins tend to oscillate between networks. Apparently, training nets on games where white wins more makes black win more in the next gen, and vice versa. As long as allowing resignations don't change this negative feedback, it should be fine.

Error323 commented 6 years ago

Yes, I think it's solely an optimization (a good one though). When white starts winning more it automatically generates more countersamples for black, this is how it always creates the perfect opponent data for itself to learn from.

jjoshua2 commented 6 years ago

Im thinking of working on a pull to add resign at the current .10 but only if a tbprobe shows it is a loss. Then we will have 0 false resigns and a decent gain in game rate efficiency. Will only work with people who have tbs so we keep some training games with no resign like leela and a0.

From my analysis of my self play games it looks like 97% is otherwise the best resign threshold. Even the winning move I only saw 99% once. But 97 is pretty close to winning consistently.

On Thu, Mar 15, 2018, 4:32 AM F. Huizinga notifications@github.com wrote:

Yes, I think it's solely an optimization (a good one though). When white starts winning more it automatically generates more countersamples for black, this is how it always creates the perfect opponent data for itself to learn from.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/112#issuecomment-373298806, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INMcXbA6g5eZ_g5-CMtw56hXE7rC6ks5teicAgaJpZM4SqRT0 .

jkiliani commented 6 years ago

Sorry, but testing resignations with table bases is really flawed. You're testing whether a resignation would be correct given perfect play, not whether the player with the advantage is able to convert it given the current neural net and 800 playouts. That's something completely different. If the net isn't able to convert some winning positions, it can't be rewarded for winning them, or this will leave holes in its knowledge.

Besides, what about lost endgame positions with loads of blocked pawns? You won't find these in any table bases, but an experienced player would still see the loss at a glance. tbprobe would block resignations in those positions.

jjoshua2 commented 6 years ago

So you think it would be better to have it resign some positions that aren't lost?

Leela zero and a0 both used resign. But it seems reducing the false resigns can only be good.

On Mon, Mar 19, 2018, 12:17 PM jkiliani notifications@github.com wrote:

Sorry, but testing resignations with table bases is really flawed. You're testing whether a resignation would be correct given perfect play, not whether the player with the advantage is able to convert it given the current neural net and 800 playouts. That's something completely different. If the net isn't able to convert some winning positions, it can't be rewarded for winning them, or this will leave holes in its knowledge.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/112#issuecomment-374271042, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INHGzfnORhIc_--gjNI5J7_y_xEKkks5tf9oTgaJpZM4SqRT0 .

jkiliani commented 6 years ago

No, I don't disagree with resignations, just with using table bases to test the resignations. The threshold should be tested by tracking the minimum evaluation of games which are won (or drawn, for chess) in the end. Doing that for a sufficient sample of games let's you set a good threshold.

jkiliani commented 6 years ago

And yes, it should be allowed to resign some games that aren't lost according to table bases, as long as the engine does not have a significant (>5 %) chance of averting that loss at its current skill level. As long as 20% of games are always kept without resignation, the network will eventually revise its assessment of some positions it thought decided earlier.

jjoshua2 commented 6 years ago

What good does it do to allow the engine to resign an endgame that is not lost? Even if it would have lost it, it can't learn to not lose it if it is not played out. Eventually the noise will find the solution. And these are the ones that are decently likely to be the false resigns anyway.

We could still allow resigns before endgames, and only disallow resigns in endgame positions that are not lost. If that makes people happier.

jhellis3 commented 6 years ago

It just seems unnecessary. It is not as if there is currently a problem with generating a sufficient number of games, so why discard potentially useful data.

jjoshua2 commented 6 years ago

I would use the extra compute to raise the playouts to 1200 until we have regular automated network testing like leela zero. But that is a whole separate issue with what to do with having more efficient pipeline. Eventually it will be useful to have faster selfplay, like when we are using a 20 block network.

We might as well have written and tested code ready to go when we need it, and approval from project leader on methods.

jhellis3 commented 6 years ago

Those issues are completely orthogonal. If throughput becomes an issue, it may be worth investigating, but until then there is only risk for no real reward.

Error323 commented 6 years ago

Resignation as implemented in AGZ generalizes beyond just known table based draws and therefore is a more efficient optimization. I also find it to be more elegant at that, but this may be secondary to some.

jjoshua2 commented 6 years ago

I'm more worried about maximum strength than elegance, so why not use both optimizations. It is still pretty simple, and the more strength we have the more we will get contributors.

Eventually endgame knowledge will cause midgame strength to suffer because there aren't enough weights to store all the information. We could then rely on TBs for those who want maximum strength, and train a separate NN for endgames for those who want to stay with that, although I still consider TB to not be human knowledge, but truth. Or we may end up forking projects because people have different goals. But even if we don't go down this route it still would increase efficiency and allow us to switch to higher blocks and keep game rate.

Uriopass commented 6 years ago

Forking is probably not a good idea since splitting a distributed computing userbase can only make things worse for everyone.

jjoshua2 commented 6 years ago

@Uriopass I whole heartedly agree. But there would be ways around it. Like let people choose whether they want to play resign games or no resign games, or even extend or verify early resign games, just like boinc/prime95 projects let you choose which subprojects to work on.

jhellis3 commented 6 years ago

Eventually endgame knowledge will cause midgame strength to suffer because there aren't enough weights to store all the information.

This should not be a danger because the results will be self-correcting to maximize overall strength. IOW, it should only do this if doing so is genuinely stronger than the alternatives and to not do so would make the engine weaker.

If you increase efficiency but get an inferior result, you haven't really increased efficiency....

It is not clear to me how false positives or simply lost knowledge would ever be corrected.

If you train to achieve a won position and get there, but then can not play out the win you can expect poor results if the person running the tournament does not use the same adjudication rules as you do in training (or the opposing engine uses some form of contempt), which could be quite embarrassing.

Then there is also the danger of simply evaluating positions incorrectly but never correcting & even rewarding and reinforcing poor/objectively wrong play.

jjoshua2 commented 6 years ago

We could include 3/4 man TB, and only use these. SF has done tests on fishtest and talked about doing that. The size is under our network weights. This way tournaments could not mess up and use different adjudication conditions that would cause poor performance.

Maximize overall strength balancing middlegame and endgame would be good, but underperform maximizing middlegame and endgame separately, using a separate net or TB, in the real world where size of nets and playout speed is an issue.

Even without TB support, it would still be using NN network weights to decide to resign, and many games would be played with resign off, so that's how false positives and lost knowledge would be corrected. With TB support, that would not be necessary. Except to continue resignations that occurred in midgame if we do that.

jkiliani commented 6 years ago

I think a decently sized net should be able to learn most of what's contained in 4 piece TB without actually using one, and I agree that not using it is the more elegant approach. I would prefer the neural net to be trained to perform well (even if not perfect) at all parts of the game, instead of making it completely reliant on a particular table base to even be able to play any endgame with few pieces.

Coupling the NN-based engine to a table base should be done as an option, which (if activated) searches a table base in addition to the neural net during MCTS. If a table base gets a hit, this would then by treated by the search as if it had found a checkmate. This would increase endgame playing strength without making the neural net itself reliant on it.

jhellis3 commented 6 years ago

Ex: with 4 man you adjudicate KBNk as win/loss upon reaching it at root. The NN will never learn how to actually win such positions, and against opponents of any skill could fail to convert. To make matters worse, the NN will now prefer to enter such adjudicated positions where it may not know how to play because they are treated as automatic wins.

jjoshua2 commented 6 years ago

@jhellis3 I'm not sure if you are aware of leela zero's resign analysis like this. Leela zero uses 10% resign % which only gets a few % more wrong than no resigns at all. If we can reduce the % wrong resignations it is good, because we either have better data, or can increase the resign threshold.

For your example, in all resign situations, 20% of games are played with resign/adjudication disabled. Otherwise it would not know how to actually win these won positions, and could have bad amplification loops of things tending towards those won things. The only way around this would be if tablebases were integrated into it, so it would also use them and could win with them, but that is not necessary to start progress in resign, and is more divisive.

jjoshua2 commented 6 years ago

If the NN has winrate 95% it would have resigned that position KBNk anyway with the leela or a0 method, and if it doesn't, I won't even probe TB. This would only affect it, if it things KPvK it thinks is a loss, but it can actually draw because the pawn is on the edge so king can block, so we won't allow a resign.

jkiliani commented 6 years ago

@Error323 Does the training data currently contain the network position evals along with the visit distributions and game outcome? Leela Zero does not, and it means that all resign analysis there has to be done at the client end, by someone who disables resignation for all their games for a while. If the training data here includes this, it could be used for a resign analysis with much better statistics.

jhellis3 commented 6 years ago

At the 10% resign rate, it gains 7.3% shorter average game length. Being generous we will up that to 8%, which is worth (again generously) 9% more throughput (of which we already have plenty IMHO). This is without accounting for potential negative effects. And Go is a different game than chess.... Why bother?

The potential benefits just do not seem worth the time or worry to create a guaranteed robust system, to me... YMMV.

jjoshua2 commented 6 years ago

Actually according that link, with Go even a 1% resign rate made a 18.76% shorter average game length. 5% was an additional 6.45% so about 25% shorter games.

jhellis3 commented 6 years ago

I just read the numbers off the chart the guy posted. It would appear there is conflicting information in that link then. Anyway, I have said my piece on the subject.

jkiliani commented 6 years ago

The marginal benefit for Leela Chess will be less than for Leela Zero, mainly since we're using temperature=1 for the whole game which allows much more endgame blunders. In consequence, the network will have a lot lower confidence in winning even positions where very little could go wrong given strong play on both sides.

jjoshua2 commented 6 years ago

That 7.3% was additional shorter from the previous line, which was 5% resign threshold. So with 10% threshold you get up to about 25% (from the 1% and 5% added together) +7% or 32% shorter games.

jhellis3 commented 6 years ago

Doing the math from the chart (which the poster really should have done in the first place), gives 18.76x1.0645x1.0736 = 21.4398 or 22% more throughput on a good day. If the project were severely throughput capped, maybe it would be worthwhile. Currently, that is simply not the case. Not to mention none of the risks have been properly addressed... But whatever, I'm not going to invest anything more in the matter.

jjoshua2 commented 6 years ago

The table was compiled from here. 10% gave a 29.59% reduction. 3% incorrect and .6% did not resign. This is under Deepmind's safe rate of 5% false positive that they used. The fact that alphazero and leela zero developed pro level bots is a pretty decent indicator that risks are low. We could even introduce further safeguards to lower false positive rate, like not allowing a resign in some situations where it is incorrect (but most seem to favor simplicity over this so far).

30% speedup means playouts could be increased to 1000 or blocks could be increased 6 or 7 with the same throughput.

jhellis3 commented 6 years ago

Once a NN is of considerable strength, I would fear a 3% error rate in my data far more than welcome having 30% more of it.

jkiliani commented 6 years ago

Deepmind consider 5% false resignations acceptable, since in most cases these consist of decided positions turned around by endgame blunders.

In principle there's nothing wrong with allowing resignations as long as there is a solid analysis of wrongful resigns. But it's not really urgent right now to have it, there are many other optimisations which seem more helpful at an early stage, for instance getting the match system functional and implementing symmetries.

Dorus commented 6 years ago

3% false resign is more than fine, as long as you let a small % (like 20%) of games run without resign. However unless you get something like at least 10-20% shorter games, i wouldn't bother really. I would expect games to become shorter anyway when the net learns how to put checkmate in a little quicker. Not to mention practice with few pieces might also translate to better moves with more pieces.

Once the net becomes stronger, the % of game reduction will (probably*) go down, and turning on resign will be fine. 3% wrong resignations really isn't that harmful, only the value net is affected as the policy net will still search for the best move.

*) for go at least, a stronger net gave a lot better resign prediction, but for chess this might not be true as draws are still possible as well as i'm sure you can play with time management and make forced sequences or mate in n sequences a bit faster.

jhellis3 commented 6 years ago

No offense, but these never ending appeals to authority grow tiresome. Deepmind's goal was to train up a network that would achieve superhuman strength (and likely defeat or at least tie SF8) in as short a time as possible. In that light, their decision makes perfect sense. As does their decision to always promote. But that does not mean those are the best choices for an open ended project which does care about being as strong as objectively possible (not just good enough to make a point).

ashinpan commented 6 years ago

I have been watching LC0 playing against Scorpio in the TCEC Bonus match, and the general opinion there is that LC is very weak in end games. If LC is not to rely on table bases, it should have the chance to learn from a lot of end games. Therefore, I propose that all self-play games should be played out to the end without resignation.