Avoid temperature moves with only one (or two) visits

gonzalezjo commented 6 years ago

If this is considered a valid issue, I'm willing to write a PR.

Leela Zero does not play "temperature moves" that only have one visit. This should prevent some of the more egregious moves from entering training game data.

killerducky commented 6 years ago

Some earlier related discussion (OP of this issue suggests we do the opposite of what you suggest) https://github.com/glinscott/leela-chess/issues/559

gonzalezjo commented 6 years ago

That’s... interesting. Definitely makes me rethink things. I’m firmly on the side of increasing exploration at almost all costs, but I’m also on the side that temperature can have some negative side effects. I wish we had more data. It’s hard for me to hold both beliefs.

My prior reasoning for this comes from this issue, where gcp discusses the impact of temperature on Leela Chess: https://github.com/gcp/leela-zero/issues/1355

I can definitely point at positions where Leela Chess is sometimes doing badly because of high temperature, i.e. where it plays to drawn endings from a better position, only because the resulting positions makes it much more likely for the worse player to blunder. Something similar happens to giving very high evaluation to making some threats around the king, and that sometimes causes lost games.

And also:

The reasoning is that by selecting moves with 1 visit, you are actually neglecting the result of the tree search.

You commented on it, which was cool! So now I’m wondering what insights you might have on this, given that you’ve probably had it in your mind for w longer than I have.

We know that the Leelas need exploration. We also know that they benefit from having higher quality games to work work with. Both are likely to help raise the learning ceiling, but I guess it’s a tradeoff.

Edit, and of course, as this comes down to exploration, I wonder to what degree Videodr0me’s concerns are alleviated by substantially increasing CPUCT to flatten policy.

jjoshua2 commented 6 years ago

I think it is necessary for noise alone to be able to cause a move to be considered and played in training occasionally. But it also should not normally happen in the average endgame otherwise it will expect that you can win merely by not blundering. Yes there is some chance but it is a very small chance against even a mid level engine.

On Sun, Aug 26, 2018, 1:07 PM J. Gonzalez notifications@github.com wrote:

That’s... interesting. Definitely makes me rethink things. I’m firmly on the side of increasing exploration at almost all costs, but I’m also on the side that temperature can have some negative side effects. I wish we had more data. It’s hard for me to hold both beliefs.

My prior reasoning for this comes from this issue, where gcp discusses the impact of temperature on Leela Chess: gcp/leela-zero#1355 https://github.com/gcp/leela-zero/issues/1355

I can definitely point at positions where Leela Chess is sometimes doing badly because of high temperature, i.e. where it plays to drawn endings from a better position, only because the resulting positions makes it much more likely for the worse player to blunder. Something similar happens to giving very high evaluation to making some threats around the king, and that sometimes causes lost games.

And also:

The reasoning is that by selecting moves with 1 visit, you are actually neglecting the result of the tree search.

You commented on it, which was cool! So now I’m wondering what insights you might have on this, given that you’ve probably had it in your mind for w longer than I have.

We know that the Leelas need exploration. We also know that they benefit from having higher quality games to work work with. Both are likely to help raise the learning ceiling, but I guess it’s a tradeoff.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LeelaChessZero/lc0/issues/304#issuecomment-416053435, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INF3CxNjPQw5Z5_iQ-20acrxqSb7nks5uUtXHgaJpZM4WMhfX .

jjoshua2 commented 6 years ago

So the way I would balance this is to do enough visits where noise alone gives something like 1/1000 of two visits which would mean over 50 moves there is a small chance of a blunder and not allow moves with one or zero visits. This is what leela go did.

On Sun, Aug 26, 2018, 1:24 PM Mark Jordan jjoshua2@gmail.com wrote:

I think it is necessary for noise alone to be able to cause a move to be considered and played in training occasionally. But it also should not normally happen in the average endgame otherwise it will expect that you can win merely by not blundering. Yes there is some chance but it is a very small chance against even a mid level engine.

On Sun, Aug 26, 2018, 1:07 PM J. Gonzalez notifications@github.com wrote:

That’s... interesting. Definitely makes me rethink things. I’m firmly on the side of increasing exploration at almost all costs, but I’m also on the side that temperature can have some negative side effects. I wish we had more data. It’s hard for me to hold both beliefs.

My prior reasoning for this comes from this issue, where gcp discusses the impact of temperature on Leela Chess: gcp/leela-zero#1355 https://github.com/gcp/leela-zero/issues/1355

I can definitely point at positions where Leela Chess is sometimes doing badly because of high temperature, i.e. where it plays to drawn endings from a better position, only because the resulting positions makes it much more likely for the worse player to blunder. Something similar happens to giving very high evaluation to making some threats around the king, and that sometimes causes lost games.

And also:

The reasoning is that by selecting moves with 1 visit, you are actually neglecting the result of the tree search.

You commented on it, which was cool! So now I’m wondering what insights you might have on this, given that you’ve probably had it in your mind for w longer than I have.

We know that the Leelas need exploration. We also know that they benefit from having higher quality games to work work with. Both are likely to help raise the learning ceiling, but I guess it’s a tradeoff.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LeelaChessZero/lc0/issues/304#issuecomment-416053435, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INF3CxNjPQw5Z5_iQ-20acrxqSb7nks5uUtXHgaJpZM4WMhfX .

gonzalezjo commented 6 years ago

Fair, and I agree.

Do you think that’s worth the decrease in game rate? Are there any estimates on the game generation speedup that adaptive resign rate can provide?

The increase in cpuct to >=5 should help with increasing the impact of noise.

dubslow commented 6 years ago

I'm in favor of excluding one visit moves from temperature selection. Dealing with First Play Urgency has been a problem, and excluding one visit moves in a way gives a good way to ignore the problems of FPU when it comes to positional variety.

oscardssmith commented 5 years ago

Should be closed. Temp offset handles this.

LeelaChessZero / lc0

Avoid temperature moves with only one (or two) visits #304