LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.41k stars 526 forks source link

Need gtest for best_child_cached_ code #814

Open killerducky opened 5 years ago

killerducky commented 5 years ago

This is a gotcha for people changing the PUCT formula. It would be nice to have a gtest for this logic, that would detect when someone changes the main PUCT formula but doesn't realize it breaks this caching logic. I think a majority of those PRs have this bug, like #746

Here are a few code snippets of how the caching works:

Node* possible_shortcut_child = node->GetCachedBestChild();

      int estimated_visits_to_change_best =
          best_edge.GetVisitsToReachU(second_best, puct_mult, fpu);
      // Only cache for n-2 steps as the estimate created by GetVisitsToReachU
      // has potential rounding errors and some conservative logic that can push
      // it up to 2 away from the real value.
      node->UpdateBestChild(best_edge,
                            std::max(0, estimated_visits_to_change_best - 2));
  int GetVisitsToReachU(float target_score, float numerator,
                        float default_q) const {
    const auto q = GetQ(default_q);
    if (q >= target_score) return std::numeric_limits<int>::max();
    const auto n1 = GetNStarted() + 1;
    return std::max(
        1.0f,
        std::min(std::floor(GetP() * numerator / (target_score - q) - n1) + 1,
                 1e9f));
  }
killerducky commented 5 years ago

I quickly scanned some PRs and tagged those that I guess might have a problem.

killerducky commented 5 years ago

See https://github.com/LeelaChessZero/lc0/pull/619 for original PR for this code.

killerducky commented 5 years ago

Tilps just pointed out Node::FinalizeScoreUpdate() invalidates best_childcached. That's why the code can assume Q does not change because it's only valid during batch gathering.

Naphthalin commented 4 years ago

Having this kind of test would indeed be helpful; currently, the first sign usually is a serious nps drop. Any news on this?