Closed mooskagh closed 3 years ago
Unrelated to whether this behaviour is intended or not, this is a very interesting situation where the depth dependence of evals does strange things to our logic. Basically, if we assume that we're in a situation where the "real eval" is a draw, but it is overevaluated at low nodes. Higher node count will then mean more drawish and therefore lower Q; and if we're now dealing with a transposition, this means that the move with the lower policy will have lower N at the same Q, but as Q(N) is decreasing, we will always have highest N and best Q on different moves. Therefore, if we fix the behaviour of piggybank, transpositions like this would simply drain the time left.
piggybank logic is Q+M_score (not Q+M or just Q). The puct is Q+M_score based, so it seems sensible that it should be done this way.
Adding M_score (not M) to the verbose move stats might be good - lack of precision in showing M value makes it hard to know exactly what the threshold is here.
Actually I think the S column is Q+M_score now that I think about it. And it shows max_n had max_s.
It feels a bit excessive for such a tiny difference at such high distance (125.9 vs 125.8) to contribute relatively large score difference of 0.00050.
But I think it was mostly bad luck due to Q being very close too (diff being 0.00011).
Possibly in such cases we can activate piggybank too, but to have some logic so it doesn't drain entire thing.
Closing as the bug is explained and possibly will be resolved when #1582 is implemented.
It probably works, but worth looking deeper what happened there.
In TCEC 2021 DivP game 10 (vs Stockfish), on move 39, Lc0 stopped searching in a situation when the move with maximum Q was different from the move with maximum N. According to SF eval, that potentially made Lc0 miss win.
It's expected that time piggybank activates in this situation and the engine continues to think.
Log snippet: