Example position, showing how PUCT is starving slightly inferior moves

LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.

GNU General Public License v3.0

2.45k stars 529 forks source link

Example position, showing how PUCT is starving slightly inferior moves #1410

Open Naphthalin opened 4 years ago

Naphthalin commented 4 years ago

Position: 8/8/8/8/8/3N1K2/2p1N3/1k6 w - - 0 5. Only winning move is Nec1 with a mate in 11, all other moves draw. https://syzygy-tables.info/?fen=8/8/8/8/8/3N1K2/2p1N3/1k6_w_-_-_0_1

The position was originally given by lee11 in Lc0 Discord #help as a Lc0 blunder which turned out to be incomplete TBs.

I was curious as whether Lc0 can find the mate on its own, and let different Lc0 versions with 703810 run in Nibbler; default values expect for --minibatch-size=168 for my GPU. Results as follows:

Naphthalin commented 4 years ago

Lc0 v0.26.1, 703810:

Initially, Nec1 is preferred over Ndc1, but then the latter pulls ahead by ~0.02 Q difference, and starves the move completely. Only when the eval of Ndc1 is starting to drop slightly, Nec1 is touched again; at this point it's 50M visits on Ndc1 and 621k visits on Nec1.

Lc0_0 26 1_703810_15M Lc0_0 26 1_703810_30M Lc0_0 26 1_703810_50M Lc0_0 26 1_703810_60M

Naphthalin commented 4 years ago

Lc0 #1375 (now merged): as scoring twofold repetition as draws helps with the draw evaluation in some positions, I wanted to test whether that helps in this position as well.

Results are more or less as expected: The eval for the top move starts dropping slightly earlier, the starving of the 2nd move stops arounds 32M nodes instead of around 55M nodes, but it still takes 10M visits out of which 9.5M go into the top move until the 2nd option has enough visits that it finds the mate.

Lc0_PR1375_703810_15M Lc0_PR1375_703810_30M Lc0_PR1375_703810_45M

Naphthalin commented 4 years ago

For fun I also tested Lc0 #963; while the evals seem to be more accurate, the minimax definitely misses the "sense of direction" which is given by PUCT once enough mates happen in a subtree. While it starts to give visits to Nec1 a lot sooner, the simple effect of needing more than twice the visits is enough that it still hasn't found the mate after 75M nodes.

![Uploading Lc0_PR963_703810_15M.png… Lc0_PR963_703810_30M ]() Lc0_PR963_703810_45M Lc0_PR963_703810_60M

Naphthalin commented 4 years ago

Finally, Lc0 #1173 which basically uses a different approach to cpuct scaling by "decaying" the policies to 100% with visits spent on the move. I can't post the pictures at 15M, 30M, 45M nodes though as it found the mate in less than 2M nodes in the first try; second try is less lucky, finding it at ~10M nodes, third try found it at ~3M nodes:

Lc0_PR1173_703810_5M Lc0_PR1173_703810_14M Lc0_PR1173_703810_3rd_try

Naphthalin commented 4 years ago

Out of curiosity, I let J92-70 run the position as well -- and it is much much faster, as it explores the relevant moves right from the beginning. Lc0 v0.26.1 reports mate in 11 at 94k nodes, Lc0_1375 needed 160k nodes, Lc0_963 173k nodes, Lc0_1173 187k nodes; the fluctuations are about what is to be expected from different batch size.

kiudee commented 4 years ago

Good analysis and it also shows why disregarding policies is so important for long searches.