LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.4k stars 525 forks source link

PV sometimes incorrect with tree reuse #167

Closed apleasantillusion closed 6 years ago

apleasantillusion commented 6 years ago

While doing some deep analysis of several positions, I've noticed that lc0 will sometimes show an incorrect PV under the following conditions:

1) A search from some position is started 2) After some time, the analysis is stopped, and lc0's top move is played 3) From the resulting position, analysis is restarted

Several times in these conditions, I've seen the PV show a first move that was clearly not the top move at any point in the current search.

Here's an example of output from one such search:

 5/37   00:00    5,190k 0   +0.01   Bf1-e2 c5xd4 e3xd4 Bc8-f5 c2-c3 e7-e6 Qd1-b3 Qd8-c8 Ng1-f3 Bf8-e7 Nb1-d2 O-O O-O Nb8-c6 Rf1-e1 h7-h6 h2-h3 Rf8-d8 a2-a4 Be7-d6
 5/37   00:05    5,232k 8k  +0.01   Bf1-e2 c5xd4 e3xd4 Bc8-f5 c2-c3 e7-e6 Qd1-b3 Qd8-c8 Ng1-f3 Bf8-e7 Nb1-d2 O-O O-O Nb8-c6 Rf1-e1 h7-h6 h2-h3 Rf8-d8 a2-a4 Be7-d6
 5/37   00:10    5,274k 8k  +0.01   Bf1-e2 c5xd4 e3xd4 Bc8-f5 c2-c3 e7-e6 Ng1-f3 Bf8-d6 Bf4xd6 Qd8xd6 Nb1-d2 h7-h6 O-O O-O Rf1-e1 Qd6-b6 Qd1-b3 Qb6-c7 a2-a4 Nb8-c6
 5/37   00:12    5,298k 9k  +0.01   Bf1-e2 c5xd4 e3xd4 Bc8-f5 c2-c3 e7-e6 Ng1-f3 Bf8-d6 Bf4xd6 Qd8xd6 Nb1-d2 h7-h6 O-O O-O Rf1-e1 Qd6-b6 Qd1-b3 Qb6-c7 a2-a4 Nb8-c6
f1e2 (130 ) N: 817009 (+264) (P: 2.91%) (Q: 0.00275) (U: 0.00028) (Q+U: 0.00302) (V: -.----) 
c2c3 (259 ) N: 856749 (+ 0) (P: 16.04%) (Q: -0.00077) (U: 0.00147) (Q+U: 0.00070) (V: -.----) 
g1f3 (159 ) N: 3150189 (+ 0) (P: 11.83%) (Q: 0.00032) (U: 0.00029) (Q+U: 0.00061) (V: -.----) 

Every PV showed Be2, even though Nf3 has clearly always been the top move in this search (it has 2.3 million more visits than the next two moves, while the current search only incremented visits by about 100,000).

lc0 will still play the correct top move; it seems to be just the PV display that is incorrect.

When I posted this example in Discord, @killerducky noted that the stopped analysis in the previous position was likely mostly exploring Be2 (note its substantially higher Q and Q+U), so that may well be relevant.

I originally saw the behavior from lc0-win-20180701-cuda92-cudnn714 (which @mooskagh correctly pointed out in Discord was rather old by now), and then reproduced with lc0-win-20180711-cuda92-cudnn71 (the example above was from the more recent build).

dubslow commented 6 years ago

Does this only happen with very high node counts, or at all node counts?

apleasantillusion commented 6 years ago

Unknown. I'm in the middle of getting some deep analyses done with a pricey VM in google's cloud, so have only noticed it with high node counts so far.

Once I'm done with that I can try reproducing at lower node counts, probably be a few hours before then.

dubslow commented 6 years ago

I think I saw this reproduce on my stream, though it fixed itself quite quickly so I'm not strictly sure.

haleysa commented 6 years ago

I have seen something similar at relatively low node counts, in the 10k range - if I analyze a position, say go nodes 10000, then analyze the same position with more nodes, sometimes the first PV will not be the actual top move. I think Killer Ducky's comment is right; it's because when the analysis restarts, it saves the tree, but doesn't save the best_moveedge (or whatever the right variable is); until the new search actually explores the top move, the best move node is pointing to the best node actually explored this search. In low node counts, this self-corrects very quickly, which is why I'm only seeing it for the first PV line, but in mega-node counts, it could be a long time before the best node is actually visited again.

apleasantillusion commented 6 years ago

Found what seems to be an easy repro (tested with several nets).

Use this command line: lc0.exe --weights=kb3.txt --max-prefetch=0 --minibatch-size=1 --fpu-reduction=-100 --verbose-move-stats

Then go nodes 10, followed by go nodes 1

The large negative fpu reduction forces the first several visits to all go to different nodes, The second search, the one from go nodes 1, will show whatever silly move sorts 10th for that particular net in the initial position in its PV.

dubslow commented 6 years ago

Thanks guys, that ought to be enough for me to track it down -- I even have a suspected fix in mind. All I need now is time.