LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.39k stars 525 forks source link

implement bayesian optimization for parameter tuning #793

Closed wakamex closed 4 years ago

wakamex commented 5 years ago

we already have difficulty tuning existing parameters (as in pr #755), with several new ones just introduced (pr #750), and at least another waiting in pr #791

deepmind and others had success with bayesian optimization for efficient tuning of large numbers of parameters. deepmind optimised between 3 and 8 (deepmind paper)

applied to SF it approximated major piece value (4 params) in less than 10 mins, to only 14 less elo, LOS 25%, see discussion

it seems this could be easily applied to lc0, including upcoming PRs. i'll try to get something working using the linked resources. would be great if anyone else wants to try or share their results.

it also seems possible to me to even distribute trials through the client, centralising results on the server to determine tuned values, and distribute further trials for evaluation, if any. though I know nothing about how the client/server works, and it seems you can get good results locally as well.

however deepmind seems confident tuning greatly improved training, not just match play, by applying it between subsequent versions. from the paper (my emphasis):

3.2 Task 2: Tuning fast AlphaGo players for data generation We generated training datasets for the policy and value networks by running self-play games with a very short search time, e.g., 0.25 seconds in contrast to the regular search time. The improvement of AlphaGo over various versions depended on the quality of these datasets. Therefore, it was crucial for the fast players for data generation to be as strong as possible. Under this special time setting, the optimal hyper-parameters values were very different, making manual tuning prohibitive without proper prior knowledge. Tuning the different versions of the fast players resulted in Elo gains of 300, 285, 145, and 129 for four key versions of these players.

initial dicussion on leela-zero leela-zero further discussion on fishcooking: fishcooking

jhorthos commented 5 years ago

this is a good idea and i will try to educate myself on what is involved in efficient tuning. we already know of one obvious thing, which is to reduce or eliminate temperature. so far i have used CLOP one by one for some parameters and I can say with some confidence that if we stick with Fpu absolute, -0.7 is a better setting than -1.0.

fischerandom commented 5 years ago

https://github.com/fmfn/BayesianOptimization

Naphthalin commented 4 years ago

currently @kiudee is using his library with a slightly more advanced technique than this for parameter tuning, so we are already doing this -- maybe add some documentation on the process or at least linking to @kiudee's repo?

mooskagh commented 4 years ago

I think that's implemented by @kiudee. Documenting it is probably out of scope of this issue, although it's a good idea.