Open Mardak opened 6 years ago
I tried a lot of these schemes at root and throughout the tree, unfortunately in self play they are always hugely inferior. Just try the selfplay option of LC0 to test your approach (min 1000 games). For example with lc0-cudnn selfplay --parallelism=8 --backend=multiplexing "--backend-opts=cudnn(threads=2)" --games=10000 --visits=100 --temperature=1 --tempdecay-moves=10 player1: --your-modification=1 -player2: --your-modification=0
If you just want to find tactics this might be ok, but be aware that it might be a huge elo loss.
As an alternative to #698 for those who don't like the inelegance of just forcing two visits per root move, there's existing numbers that affect root's search: epsilon, alpha, fpu, etc. For those, they're currently 0.25, 0.3, and 0.0 respectively.
Using the same game from the initial comment for analysis trying to find Rxh4 https://clips.twitch.tv/NimbleLazyNewtPRChase:
Doing 200 runs of
go nodes 800
, here's how many that end up searching Rxh4 deeper (and becomes the most visited move):Where the negative fpu is from:
I suppose first off, are AZ's numbers resulting in about 1 in 3 games in this same position finding the correct move a desired amount of randomness?
The premise behind this issue and the other issue is that for a self-play to end up in a learnable board state, it seems unfortunate that it misses the opportunity to generate valuable training data for the correct move more often than not. Clearly, AZ's numbers are good enough to eventually generate strong networks, but perhaps training search could be better optimized?
In the table, I also included the max of 200 runs observed prior N for Rxh4, which with id359 is normally 0.33%. As expected from increasing alpha, the max prior decreases as it's spread over other moves, but at least for this move, most times just increasing the prior to 1% is enough for search to direct most of the visits to Rxh4. Of course, setting a negative fpu at noised root helps give at least one visit to each move, but for this board position, 2 visits are needed to realize it's a good move.
Additionally, if these numbers become something that the server can tell the client to use, similar to turning on/off resignation for a portion of game tasks, there could be a mix of epsilon/alpha/fpu numbers.