cfg_fpu_dynamic_eval mode includes virtual loss

glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero

http://lczero.org

GNU General Public License v3.0

760 stars 298 forks source link

cfg_fpu_dynamic_eval mode includes virtual loss #317

Closed killerducky closed 6 years ago

killerducky commented 6 years ago

auto fpu_eval = (cfg_fpu_dynamic_eval ? get_eval(color) : net_eval) - fpu_reduction;

This code includes virtual losses in it. There should probably be a get_pure_eval(color) or an argument to get_eval(color, no_virtual_loss) that excludes them. This is making it unclear exactly what is going on because things depend on how many threads are going into the same move etc.

Not super urgent but we should probably clean this up before tuning other FPU related stuff.

Thanks to crem for finding this.

killerducky commented 6 years ago

@jkiliani note this bug effects even -t1 because after you select a node, a virtual loss is applied. Then when selecting the next node, get_eval is called on the current node, which now includes the virtual loss.

jkiliani commented 6 years ago

Since it increasingly looks like this bug actually gains instead of losing strength, I'm rather dubious about fixing it to be honest...

jjoshua2 commented 6 years ago

I'm only ok with it, if it gains strength with all numbers of threads. We know it works with -t1, but does it fall apart when tcec uses -t43? Or even -t8 which is easy to test.

jkiliani commented 6 years ago

Very doubtful, I posted some debug output in dev channel today. The virtual losses massively reduce FPU for nodes far up the search tree, while leaving the FPU for root nodes unchanged. This is the case for all thread counts I looked at, though to be thorough, strength should be tested with multithreading for this.

mooskagh commented 6 years ago

While it may improve strength of play, it can hinder training.

Look at this position:

Network id230 evaluates the probability to move Re2 as 0.25%

Without this "virtual loss bug", it does the first visit on this subtree during playout 520, and at playout 810 that move becomes the best. (I expect 2-4 network generations for those moves to become most probable).

With "virtual loss bug" however the first visit to that subtree happens at playout 1501, so with 800 playout training, that move is trained with probability 0.00.

The same would happen with FPU reduction I guess, but hopefully we won't do FPU reduction in training games.

jjoshua2 commented 6 years ago

FPU of 0.1 sometimes helps and sometimes hurts elo in my tests, but using .05 seems very conservative. and gaining Using noise and -t=1 should be enough for training to find tactics, so we should probably tune for elo on a variety of nets, instead of tactics puzzles. But I do see the point that not getting a single playout will mean t=1 won't ever try it. I think this would be an argument for training with 1600 playouts once we stall at 800, or once we implement resign.

killerducky commented 6 years ago

@mooskagh FPU reduction is not done when noise is on (implies training) for the root node:

    // Lower the expected eval for moves that are likely not the best.
    // Do not do this if we have introduced noise at this node exactly
    // to explore more.
    if (!is_root || !cfg_noise) {
        fpu_reduction = cfg_fpu_reduction * std::sqrt(total_visited_policy);
    }

This combined with the noise should help the network discover these gaps in it's knowledge.

killerducky commented 6 years ago

Sorry I was focused on the fpu_reduction term: fpu_eval = (cfg_fpu_dynamic_eval ? get_eval(color) : net_eval) - fpu_reduction

But you're pointing out the problem in the other term, get_eval. Yes this seems like a problem.