Difference between "immediate reward" and heuristic evaluation function?

ascane / lets-go

In this project, we study the immediate versus delayed reward for the game Go.

1 stars 1 forks source link

Difference between "immediate reward" and heuristic evaluation function? #1

Open nczempin opened 7 years ago

nczempin commented 7 years ago

From my understanding, one of the reasons that Go was harder to solve than chess was that (apart from the obvious branching factor issue) it is much harder to come up with a good evaluation function in Go than it is in chess; there is no equivalent in Go to the power of the rough values of the pieces.

MCTS was one of the factors for progress in Go, so reducing it to get back to an evaluation function seems like a step backwards.

What am I missing/misunderstanding?

ascane commented 7 years ago

Thanks for your interest in our project and for sharing your thoughts.

I agree that it's hard to come up with a good evaluation function in Go, but please note that our algorithm is a variant of MCTS. The evaluation function we defined is only used to prune the search tree; we don't take into account the positions that are less likely to lead to win. By doing so, we increase the win rate while having the same number of iterations of MCTS when we choose a position to play.