CuriosAI / sai

SAI: a fork of Leela Zero with variable komi.
GNU General Public License v3.0
103 stars 11 forks source link

What effect are the Lambda and Mu? #140

Open CGLemon opened 3 years ago

CGLemon commented 3 years ago

SAI Team: First, Thank you for your research to provide the free strong Go Engine and give us a different idea to implement it.

Since the 2019 SAI's paper, SAI: a Sensible Artificial Intelligence that plays with handicap and targets high scores in 9x9 Go, SAI is quite different from that paper now. A main different is adding Lambda and Mu to sigmoid bonus. I do not understand why do you do that. What is main idea for MCTS with Lambda and Mu. What is the core idea for this? Is it significant advancement?

In addition, do you plan to publish next paper? I am interested in the detail about the every improvement methods, like Average FPU, KLE Network or adapt SAI to mush handicap and high komi etc. I can understand the basic methods by following the code. But I can not really understand the core idea and other effects. It will be helpful to me.

Very thanks! -- Hung Zhe, Lin

Vandertic commented 3 years ago

Thank you @CGLemon for your interest and your help. You can find a short explanation of the parameters lambda and mu in Section 2.4 Parametric family of value functions of the 9x9 paper, but I will try to give the idea here.

The sigmoid is useful in itself as it allows SAI to play with variable komi.

The parameters λ (lambda) and μ (mu) moreover allow to modify SAI behaviour activating (when nonzero) what we call SAI agent.

SAI agent will use the sigmoid by looking also to komi values different from the real one. In particular, it will look to the sigmoid's winrate at komi values that would make the current position more balanced between the players. How much so depends on lambda and mu. These are two real numbers between 0 and 1, and SAI expects them to be 0≤λ≤μ≤1. They are linked to the two extrema of an interval [x_λ,x_μ] of komi values. x_λ/x_μ are equal to the real komi if λ/μ=0. Otherwise they shift towards the "perfect balance komi" which they reach if λ/μ=1. SAI MCTS works like LZ MCTS, but the winrate it uses in the UCT formula is the average winrate in the interval [x_λ,x_μ] of komi values, not (necessarily) the winrate at real komi.

So, the purpose of SAI agent is to allow SAI to virtually consider different komi values, in particular when it is ahead or behind.

When LZ or AGZ are very much ahead, most move choices will have a flat 0.99 winrate, and so it becomes difficult to distinguish between moves that lose points and moves that do not. SAI agent helps with that, because these moves will have very different winrate in the range of komi the agent considers.

When LZ or AGZ are very much behind (consider handicap games), most move choices will have a flat 0.01 winrate, and in the same way the agent may help to recover points, at least against a human or weak engine.

parton69 commented 3 years ago

Thank you very much @CGLemon for the message.

In addition, do you plan to publish next paper? I am interested in the detail about the every improvement methods, like Average FPU, KLE Network or adapt SAI to mush handicap and high komi etc. I can understand the basic methods by following the code. But I can not really understand the core idea and other effects. It will be helpful to me.

Yes, we plan to publish a paper with the results on 19x19 board, including details on improvements since the 9x9 paper. Moreover, in order to prove that SAI framework is general enough to be applied to games other than Go, we started a Othello-SAI project. Results are very promising: AlphaZero-like software appears to make suboptimal moves in the endgame of Othello, just like in Go, and this suboptimal moves are not played by SAI!