Closed killerducky closed 6 years ago
@jkiliani note this bug effects even -t1
because after you select a node, a virtual loss is applied. Then when selecting the next node, get_eval is called on the current node, which now includes the virtual loss.
Since it increasingly looks like this bug actually gains instead of losing strength, I'm rather dubious about fixing it to be honest...
I'm only ok with it, if it gains strength with all numbers of threads. We know it works with -t1, but does it fall apart when tcec uses -t43? Or even -t8 which is easy to test.
Very doubtful, I posted some debug output in dev channel today. The virtual losses massively reduce FPU for nodes far up the search tree, while leaving the FPU for root nodes unchanged. This is the case for all thread counts I looked at, though to be thorough, strength should be tested with multithreading for this.
While it may improve strength of play, it can hinder training.
Look at this position:
Network id230 evaluates the probability to move Re2
as 0.25%
Without this "virtual loss bug", it does the first visit on this subtree during playout 520, and at playout 810 that move becomes the best. (I expect 2-4 network generations for those moves to become most probable).
With "virtual loss bug" however the first visit to that subtree happens at playout 1501, so with 800 playout training, that move is trained with probability 0.00.
The same would happen with FPU reduction I guess, but hopefully we won't do FPU reduction in training games.
FPU of 0.1 sometimes helps and sometimes hurts elo in my tests, but using .05 seems very conservative. and gaining Using noise and -t=1 should be enough for training to find tactics, so we should probably tune for elo on a variety of nets, instead of tactics puzzles. But I do see the point that not getting a single playout will mean t=1 won't ever try it. I think this would be an argument for training with 1600 playouts once we stall at 800, or once we implement resign.
@mooskagh FPU reduction is not done when noise is on (implies training) for the root node:
// Lower the expected eval for moves that are likely not the best.
// Do not do this if we have introduced noise at this node exactly
// to explore more.
if (!is_root || !cfg_noise) {
fpu_reduction = cfg_fpu_reduction * std::sqrt(total_visited_policy);
}
This combined with the noise should help the network discover these gaps in it's knowledge.
Sorry I was focused on the fpu_reduction term:
fpu_eval = (cfg_fpu_dynamic_eval ? get_eval(color) : net_eval) - fpu_reduction
But you're pointing out the problem in the other term, get_eval. Yes this seems like a problem.
auto fpu_eval = (cfg_fpu_dynamic_eval ? get_eval(color) : net_eval) - fpu_reduction;
This code includes virtual losses in it. There should probably be a get_pure_eval(color) or an argument to get_eval(color, no_virtual_loss) that excludes them. This is making it unclear exactly what is going on because things depend on how many threads are going into the same move etc.
Not super urgent but we should probably clean this up before tuning other FPU related stuff.
Thanks to crem for finding this.