Closed wakamex closed 4 years ago
this is a good idea and i will try to educate myself on what is involved in efficient tuning. we already know of one obvious thing, which is to reduce or eliminate temperature. so far i have used CLOP one by one for some parameters and I can say with some confidence that if we stick with Fpu absolute, -0.7 is a better setting than -1.0.
currently @kiudee is using his library with a slightly more advanced technique than this for parameter tuning, so we are already doing this -- maybe add some documentation on the process or at least linking to @kiudee's repo?
I think that's implemented by @kiudee. Documenting it is probably out of scope of this issue, although it's a good idea.
we already have difficulty tuning existing parameters (as in pr #755), with several new ones just introduced (pr #750), and at least another waiting in pr #791
deepmind and others had success with bayesian optimization for efficient tuning of large numbers of parameters. deepmind optimised between 3 and 8 (deepmind paper)
applied to SF it approximated major piece value (4 params) in less than 10 mins, to only 14 less elo, LOS 25%, see discussion
it seems this could be easily applied to lc0, including upcoming PRs. i'll try to get something working using the linked resources. would be great if anyone else wants to try or share their results.
it also seems possible to me to even distribute trials through the client, centralising results on the server to determine tuned values, and distribute further trials for evaluation, if any. though I know nothing about how the client/server works, and it seems you can get good results locally as well.
however deepmind seems confident tuning greatly improved training, not just match play, by applying it between subsequent versions. from the paper (my emphasis):
initial dicussion on leela-zero leela-zero further discussion on fishcooking: fishcooking