LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.44k stars 528 forks source link

CLOP time management for v0.18 #367

Closed mooskagh closed 6 years ago

mooskagh commented 6 years ago

We have gazillion of parameters now, including new immediate-time-use Would be nice to have it clopped both for play with and without ponder.

zz4032 commented 6 years ago

Tuning results:

Version: LC0 v0.18-dev Id 11262 2nd engine: asmFish

TC: 22.5s+0.125s/move Average nodes on opening moves: 1500

Games in tuning: 2631

Fixed parameters: Cpuct MCTS option = 2.7 (standard: 3.4) First Play Urgency Reduction = 1.1 (standard: 0.9) Policysoftmaxtemperature = 2.3 (standard: 2.2)

Values picked from previous tuning. Shouldn't matter much for this tuning.

Parameters (standard/tuned):

Scale thinking time -> 2.4/2.62
Time weight curve peak ply -> 26.2/27.4
Time weight curve width left of peak -> 82.0/83.5
Time weight curve width right of peak -> 74.0/74.9

Notes: No clear convergence because of too many parameters but I think default parameters can be considered as confirmed by this tuning.


Tuning results: Fraction of saved time to use immediately

Update: Needs to be retuned. I discovered a problem with clop recognizing uci parameters with spaces (also when replacing with underscores).

roy7 commented 6 years ago

Something to note, there's a minimum amount of time it takes to get an nps estimate smart pruning is allowed to use. If the increment is faster than that minimum time, once main time is gone immediate-time-use will have no more effect since we'll probably just think increment time every move. Not sure if this changes any decisions/etc, but .125s increment is maybe below that threshold.

mooskagh commented 6 years ago

So I'm changing it to immediate-time-user=0.4, slowmover=1.0. Sounds good?

zz4032 commented 6 years ago

Let me redo the "Fraction of saved time to use immediately" tuning today. I just found out that Clop in Linux needs parameters without spaces. I've done recompiling lc0 previously with renamed uci options but not for the latest two tunings (this one and test20 cpuct tuning).

Runing tuning now: Version: LC0 v0.18-dev Id 11262 2nd engine: asmFish TC: 22.5s+0.125s/move Average nodes on opening moves: 1500 Fixed parameters: Scale thinking time = 1.0

zz4032 commented 6 years ago

Tuning results after 1100 games: Fixed parameters: Scale thinking time = 1.0 Parameters (standard/tuned):

Fraction of saved time to use immediately -> 0.0/0.62

Gauntlet with asmFish vs. various lc0 settings:

   # PLAYER            :  RATING  ERROR  POINTS   GAMES
   1 lc0_SM1.0_F0.6    :       0     24   200.0     400
   2 asmFish           :       0   ----   416.5     800
   3 lc0_SM1.0_F0.0    :     -14     37    96.0     200
   4 lc0_SM2.4_F0.0    :     -44     32    87.5     200

SM = Slowmover / 'Scale thinking time' F = `Fraction of saved time to use immediately' Clop diagram: clop It looks like values close to 1.0 are not bad either. Probably that's an indicator that further tuning of TM curve parameters makes sense. I think we should go with slowmover=1.0 and immediate-time-user=0.6.

fli commented 6 years ago
   # PLAYER             :  RATING  ERROR  POINTS  PLAYED   (%)  CFS(%)
   1 lc0-fraction50     :     8.9   16.3   137.0     264    52      69
   2 lc0-fraction100    :     1.8   16.4   133.0     264    50      80
   3 stockfish          :   -10.6   16.5   126.0     264    48     ---

I ran some tests, with 90s+0.5s TC and slowmover=1, and it shows that immediate-time-use=0.5 has a 69% chance of being > 0 ELO than 1.0 If we set immediate-time-use to be less than 1, we just have to be a bit careful with how that works in situations where people were complaining about leela not using enough time (ponder, gpu temp bugs, etc). I think with immediate-time-use=1, the TM would be more robust to those situations.

roy7 commented 6 years ago

I do prefer 1 myself, I wonder if eventually we can clop larger increment time or pondering.

zz4032 commented 6 years ago

Retuned at 90s+0.5s/move with 1122 games. Fixed parameters: Scale thinking time = 1.0 Parameters (standard/tuned): Fraction of saved time to use immediately -> 0.60/0.58 plot clop I consider the previous tuning result 0.6 as confirmed with this 4x longer TC. I also think now 22.5s+0.125s/move is sufficient for tuning this parameter. However I'm a bit unsure why there is no pronounced optimal sample area visible. Maybe more games would be needed to make it visible because the difference in performance is inside a small Elo range. Or the optimal sample area is cropped at the upper value range which is probably bad for CLOP and leads to an incorrect optimum value which could be higher. Maybe a test in combination with a higher Slowmover value is necessary to shift the best sample area downwards.

What do you think makes most sense to be tuned next for TM:

jjoshua2 commented 6 years ago

Some ponder on tests would be good. I very much like that slow_mover/scale is at 1.0 so I would not like that to change. The immediate % and the curve parameters should be enough to not need this. The only exception would be for ponder on. SF has a slowmover *= 1.5 logic if ponder==on.

Francis is tuning TM curve (with some code rewrite) so too much duplicate work there should be avoided.

You tested Aversion to search if change unlikely before, but I think maybe it was invalid? I think this is a great candidate since its likely to be changed with all of the other things changed recently (mostly the immediate %). Also the last CLOP i did had a higher value than we are using, so even if it is the same it could be an elo gain to restore that possibly.

zz4032 commented 6 years ago

Tuning Slowmover has finished with 548 games. Ponder=on. Tuned parameters: Scale thinking time -> 2.32 (standard: 1.0)

Tuning: plot CLOP diagram (y-axis scaled from 0 to 5): screenshot from 2018-09-28 16-45-10 Looking at the CLOP diagram 1) the optimal value area is clearly visible and 2) the standard value differs a lot from the tuned value. That should result in a noticable Elo gain. Test matches (big error bars):

   # PLAYER        :  RATING  ERROR  POINTS   GAMES
   1 lc0-tuned     :      86     60    31.0      50
   2 lc0           :      71     72    30.0      50
   3 stockfish6    :       0   ----    39.0     100

I very much like that slow_mover/scale is at 1.0 so I would not like that to change.

I'm afraid we need to tune this now with ponder=off. :) If the factor of 1.5 for Slowmover with ponder=on is valid for LC0, we should get something like 2.32/1.5=1.55 for Slowmover with ponder=off.

Note: With ponder=off I was able to use Move time overhead in milliseconds = 20 (standard: 100) with no further option changes.

With ponder=on I got time losses and had to switch to Move time overhead in milliseconds = 100 and Minibatch size for NN inference = 64 (standard: 256)

jjoshua2 commented 6 years ago

20ms is quite fast. I wouldn't worry about trying to make that work. But if 100ms doesn't work at defaults that is disconcerting.

I agree that as it sounds now probably something like 1.5 for slowmover is likely a small improvement (since immediate time use is not 1.0, and time curve was tuned at 0.0), but I think the time curve could be retuned instead to use more time up front which will have mostly the same effect as increasing slowmover, without needing an extra parameter to be tuned. Maybe it would lose 2 elo but realistically tuning 4 parameters or so has much bigger elo noise than that.

fli commented 6 years ago

Did/can you test ponder with Fraction of saved time to use immediately = 1? I can imagine if it is 0.6 then the extra time will tend to accumulate rather than being used, since pondering will increase the amount of smart pruning and cut off moves earlier.

zz4032 commented 6 years ago

If Fraction of saved time to use immediately and pruning (controlled by Aversion to search if change unlikely, correct?) interact so much with Scale thinking time, they should be tuned altogether simultaneously at some time in the future (or with TM curve parameters, if Scale thinking time gets removed one day). Such a tuning would require longer time control and need a lot of games. I think that should be done once all the TM curve changes are implemented as @jjoshua2 mentioned. For a quick test I can retune Scale thinking time with Fraction of saved time to use immediately = 1 and ponder=on.

roy7 commented 6 years ago

Immediate time usage set to 1 would remove any need for slow move since there is no saved time bleeding into the rest of the match. All bonus time used asap all game long. With immediate time use 1, if the curve isn't what we want, then the curve needs adjusted. Slowmover is just a hack to use bonus time earlier in a non dynamic way.

I think @Francis solution to remove existing time curve settings is quite nice.

zz4032 commented 6 years ago

Actually this is an "issue" tagged with "release blocker" waiting for input for better settings for v0.18. Not a discussion about what is currently in long-term development. If a parameter value was found providing better performance, it should be tested and implemented in v0.18 as far as I understand. If it's too late for v0.18 to change parameters, the issue needs to be closed. There is still Aversion to search if change unlikely and Scale thinking time with ponder=off to tune. Any chance any value updates will make it into v0.18?

roy7 commented 6 years ago

Understood. It's just that slowmover and immediate-time-use are directly connected and are basically controlling the same behavior in different conflicting ways. Although slowmover affects both time curve and use of bonus time, whereas immediate-time-use=1 only affects bonus time and tries to leave the time curve alone. Perhaps tuning slowmover is easier than tuning the time curve parameters, if what's actually happening is slowmover CLOP is telling us the time curve needs to be sharper (since that's the real effect of what slowmover>=1.0 does).

fli commented 6 years ago

Just FYI and warning regarding CLOP. I just ran CLOP overnight for 1500 games and then realised that my parameter names were wrong and the values in lc0 weren't being changed, yet half of the parameter space was faded in CLOP. So I think we definitely need to check whether tuned parameters are stronger (run direct tests until 95% confidence of superiority?) before adding them to release, and be careful making assertions based on how the CLOP graph is faded.

zz4032 commented 6 years ago

I agree we need more testing, other testers could help out here. Yes, it's easy to set up something wrong in CLOP with no warning from log files. I'm rechecking all settings several times... Faded sample areas are telling me there is a pronounced sample area with much higher performance. So there should be a higher Elo gain (if the standard value is much different from the tuned). Didn't know there could be some fading with equal performance throughout the complete range.

Meanwhile I finshed tuning Slowmover with ponder=off. 382 games. Tuned parameters: Scale thinking time -> 1.71 (standard: 1.0)

Tuning: plot CLOP diagram: clop

jjoshua2 commented 6 years ago

I don't think there can be such obvious fading like in that last picture and still have equal elo throughout the range. When it's that extreme you know the parameter is doing something and the CLOP algorithm is working. When there is the bug where the parameter doesn't do anything, all the range is tested evenly, and not just a band.

On Sat, Sep 29, 2018 at 5:26 AM zz4032 notifications@github.com wrote:

I agree we need more testing, other testers could help out here. Yes, it's easy to set up something wrong in CLOP with no warning from log files. I'm rechecking all settings several times... Faded sample areas are telling me there is a pronounced sample areas with much higher performance. So there should a higher Elo gain (if the standard value is much different from the tuned). Didn't know there could be some fading with equal performance throughout the complete range.

Meanwhile I finshed tuning Slowmover with ponder=off. 382 games. Tuned parameters: Scale thinking time -> 1.71 (standard: 1.0)

Tuning: [image: plot] https://user-images.githubusercontent.com/11909324/46244094-54ee7280-c3da-11e8-93e3-beaf3cce45f2.png CLOP diagram: [image: clop] https://user-images.githubusercontent.com/11909324/46244098-5a4bbd00-c3da-11e8-98de-686335e92cb2.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LeelaChessZero/lc0/issues/367#issuecomment-425630990, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INHkC_gk2kyoALSHA2nH-B-ZAtFARks5ufzzjgaJpZM4Ww40O .

fli commented 6 years ago

That was my intuition too. It was fading before I went to sleep which is why I didn't think anything was wrong and left it running. But when I checked it again in the morning the top right quadrant was all significantly faded. I'd upload the .dat file but it's already gone.

jjoshua2 commented 6 years ago

Yes, I can see that happening, but there is a difference between a corner faded and getting an actual ellipse, like you should get when you tune two variables.

On Sat, Sep 29, 2018 at 8:27 AM Francis notifications@github.com wrote:

That was my intuition too. It was fading before I went to sleep which is why I didn't think anything was wrong and left it running. But when I checked it again in the morning the top right quadrant was all significantly faded. I'd upload the .dat file but it's already gone.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LeelaChessZero/lc0/issues/367#issuecomment-425641212, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INGIt3_MOeJ-eB4WN-mBwGG2BuXCfks5uf2cugaJpZM4Ww40O .

zz4032 commented 6 years ago

How do you want to proceed with TM curve parameters? As Slowmover tuning shows, current TM curve is far from optimal.