glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
759 stars 298 forks source link

Temparature parameter #22

Closed will-iam closed 6 years ago

will-iam commented 6 years ago

Maybe a stupid and useless question but what's the point of the temperature parameter since we always choose the best move with the visit count ? Moreover, if we decide the choose randomly the move with the distribution probability of the policy, should the temperature not decrease (with the respect of the length of the game) ? I understand that https://github.com/Zeta36/chess-alpha-zero does this.

Thank you everyone for contributing to this amazing and dynamic project! Can't wait to see it play!

gcp commented 6 years ago

Which temperature parameter do you mean?

1) In AZ, they set the temperature to a fixed 1 for the move selection in self-play. The engine chooses proportionally to the visit count. I have no idea why you think the temperature should decrease with the length of the game. If it's only to ensure divergence (more than to increase exploration), that would be reasonable (and match the Alpha Zero Go that originally had t=1 for the first 30 moves only). But exploration is good!

2) There is a cfg_softmax_temp that acts as an operator on the Network outputs. The main use is to allow some further tuning after the best network has been established. It also interacts with the UCT parameter.

will-iam commented 6 years ago

Thank you, I mixed up the two parameters. Now referring to the first one: "For the first 30 moves of each game, the temperature is set to τ = 1; this selects moves proportionally to their visit count in MCTS, and ensures a diverse set of positions are encountered. For the remainder of the game, an infinitesimal temperature is used, τ→0" , I understood that deep in the search, the temperature should decay. Sorry for being a beginner, what do you mean when you say that it ensures divergence ? and why would it be reasonable ?

gcp commented 6 years ago

I understood that deep in the search, the temperature should decay.

This parameter has nothing to do with the search or search depth. It is applied to the final search output. (And it is constant = 1 in AZ, instead of variable in AZ Go)

what do you mean when you say that it ensures divergence ? and why would it be reasonable ?

The idea is that generating more self-play games only helps if they are different. In AZ Go, there was additional randomness from rotating the board randomly, which is not present in chess. If you are only interested in playing different games (instead of also exploring moves the current network considers less good), it is reasonable to only do the randomization early on. At a certain point, the game will have diverged already.

will-iam commented 6 years ago

Thanks a lot!

jkiliani commented 6 years ago

The infinitesimal temperature τ→0 refers to a formula in the Alphago Zero paper, which sets the move probability (before normalization) as N^(1/τ). For τ=1, this means move probability proportional to visit count, for τ→0 it means greedy selection, i.e. move with highest visit count is always selected. τ→0 is a mathematical convention, since you're not allowed to divide by zero.

will-iam commented 6 years ago

In the current self-play implementation, every chosen move is the best, right ? How do we ensure divergence then ? Dirichlet noise is enough to avoid that we always produce the same game over and over ? Maybe there is another random part in the search but I don't see it.

jkiliani commented 6 years ago

In Leela Zero, additional randomisation is provided by the application of a random symmetry (rotation/reflection) to the board before network eval. That is harder to do in chess, but may be possible, see https://github.com/glinscott/leela-chess/issues/25. If not, a temperature larger than 0 will provide some degree of randomness. Alpha Zero actually uses τ=1 for self play, so there's plenty of divergence there.

will-iam commented 6 years ago

Thanks, and #28 answers my question too.