Closed mooskagh closed 5 years ago
From @jkiliani on April 4, 2018 14:7
Sounds like a great idea! Giving two out of the three values would suffice though, since all three are necessarily dependent. A "contemptuous network" would have to be retrained from scratch, but at higher playing levels this may well be what many users are going to be looking for...
Edit: Sorry, I actually misunderstood. The contempt of such a net could be tweaked by the search.
From @Akababa on April 4, 2018 18:43
Could a similar contempt effect be achieved by scaling the value output to [0,1] and then squaring? Then a win is still worth 1 point and loss is still 0, but draw is 0.25 (translated back would be 1,-1,-0.5)
From @lightvector on April 4, 2018 18:52
This has the nice advantage of making the value head much more flexible for later choice of style. @jkiliani - It's possibly simpler to just give all 3 values, the same way that for the policy head you don't pick and remove one of the outputs because it can be inferred from the others? If you just have 3 outputs in the value head and softmax them, the softmax itself cleanly enforces that their sum is 1, no?
Unfortunately this doesn't affect the policy head. If you want to make the policy head also responsive to different objectives, it's more complicated, but in principle still you can do it by providing the coefficient on the utility of a draw as an input and then varying that value during training games.
We can still train policy head as win + 0.5 lose, and only bias it during MCTS. What you suggested is a good idea indeed, but it requires too many changes and diverts the progress too much.
I guess it still will affect style while not weaken the engine.
Actually, we can train both "old style" win probability {-1..1}, and individual probabilities {win, lose, draw}. That way we are still fully compliant with AZ spec while having individual probabilities in hand,
From @evalon32 on April 5, 2018 0:55
This has value aside from playing style: Leela would be able to offer a draw, similar to resignation.
Modelling chess should really be done by the transitional probabilites between all game states (win/draw/loose). But this needs rethinking of the PUCT formula to really make use of the additonal info, also the training target becomes an issue. Currently we have one result per game to train one objective against. The value head issues would multiply if we have one result but train two or more probabilities (or transitional probabilites) against this.
I am all for experimenting with this, as its the way to go. Deep Mind wanted to emphasize their unified approach and probably did not pursue this (or did they?). But its no trivial task.
I just want to 👍 this idea.
Has anyone taken a serious look at this idea since mid July? I really think this could take Leela to the next level.
Looking at the crosstable of the CCCC 1st round, Leela is performing above it's rating against stronger engines and below it's rating against weaker engines. Leela will definitely need a good concept of contempt implemented, otherwise it's not going to beat Stockfish in round-robin tournaments. Stockfish 8 had the same problem in TCEC tournaments and it was only after properly functioning contempt was implemented that Stockfish had no more difficulties qualifying for the superfinal, where the two best engines of the round-robin preliminary stage meet.
This has value aside from playing style: Leela would be able to offer a draw, similar to resignation.
Obviously, the engine could be made to offer draws right now, but if it were able to distinguish dead draws from equal positions in which there is still play left, more training games could be cut short with legitimacy.
I reckon the biggest issues with implementing this now relate to backwards compatibility.
Close now that the PR is merged?
From @mooskagh on April 4, 2018 13:57
That's a pretty speculative idea, but it's very straightforward to implement and it gives more control on playing style which may be useful for later experiments.
So, the idea is instead of having one value out of value head (from -1 to 1), to have three values (passed through softmax layer):
During the MCTS, this will be translated into one value: If we scale Q from 0 to 1:
Also other levels of coefficients between draw and lose are possible to tweak aggresiveness more gradually (in this example it's 0.5*draw, 1*draw and 0*draw).
The same if we scale Q from -1 to 1:
(again it allows more gradual tweaks of draw from -1 to 1)
Copied from original issue: glinscott/leela-chess#241