glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
760 stars 299 forks source link

Wider (larger) PUCT, not narrower #610

Open ASilver opened 6 years ago

ASilver commented 6 years ago

GCP published data, which I do not doubt, that in head-to-head matches, a smaller PUCT value, with fewer moves but taken deeper, led to better results for the smaller PUCT value. The problem is that this will tend to reinforce its current choices as opposed to encouraging it to explore moves it might find surprising such as... tactics.

I'd like to suggest the PUCT value actually be increased to allow it to test out moves or situations it does not master, and learn from them. I do not think seeing fewer moves, but deeper, is the ideal way to evolve and learn.

ASilver commented 6 years ago

Although my run is far from over, I should point out I am getting radically different results in my CLOP run. Only about 560 games played at 1m+1s (tc=60+1) so far, with very large UCB and LCB differences still, but as you can see below, the PUCT value at the moment is about 3.1.

On Sun, May 27, 2018 at 5:13 AM, zz4032 notifications@github.com wrote:

It turned out the reason for time losses was not the parameter ScaleThinkingTime (Slowmover) but just LC0 losing at time control with small increments missing the clock by a small amount of milliseconds. A recent commit adding move overhead solved the issue completely (MoveTimeOverhead=10). In the meantime I reran the tuning at ~15s/game and got basically the same results.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/610#issuecomment-392313846, or mute the thread https://github.com/notifications/unsubscribe-auth/ADbG14pgTpfhKIh_5xktqnpK3e06Rnoiks5t2mBDgaJpZM4UANGq .

zz4032 commented 6 years ago

I took the time and ran a match with time control 96s+0.4s/move (which is about 2min/game) with all 200 positions from the Noomen 2-move Testsuite book. All paramaters on default settings except CpuctMCTS.

   # PLAYER                     :  RATING  ERROR  WINS(%)   GAMES
   1 LC0_Id342_CpuctMCTS_3.1    :     3.6   24.9     50.5     400
   2 LC0_Id342_CpuctMCTS_2.0    :     0.0   ----     49.5     400
ASilver commented 6 years ago

So I finished the CLOP run after 1500 games, which took a full 4 days to run. Details are: 1m+1s, used id338 (after healing so evals are not crazy), 3 opponents to avoid bias in favor of one (rated around 3080-3090 CCRL each), randomized openings suite. Computer tested on has GTX1060 6GB.

image

Final settings were PUCT=3.15 and FPU=0.17

ASilver commented 6 years ago

I ran a long CLOP with three settings, and it came up with the following values after 710 trials:

PUCT: 3.4 FPU: 0.9 Policy Softmax: 2.2

I then ran a match against outside engines with the default settings and these, and these new CLOP values showed a 63 Elo increase. Out of curiosity, I also tested them on a revised version of the WAC tactics suite, and the default settings solved 109/200 and these solved 159/200. In other words, they are not only stronger in playing, but are also vastly better in tactics.

They are the default settings in the June4 build and can be tested at your leisure.