google-deepmind / mctx

Monte Carlo tree search in JAX
Apache License 2.0
2.33k stars 188 forks source link

Question about Go experiments #52

Closed sotetsuk closed 1 year ago

sotetsuk commented 1 year ago

Hi! Mctx and the original paper "Policy improvement by planning with Gumbel" are both amazing 👍 I would like to try reproducing them myself. I have a question regarding the original paper.

In the experiments using Go, the evaluation was performed on Pachi. While Pachi supports the rules of japanese|chinese|aga|new_zealand|simplified_ing, it does not seem to support the most common rule in computer Go, the Tromp-Taylor rule.

So, I have two questions:

(1) What rule was used for scoring in Go during training? (2) When evaluating with Pachi, which rule was specified for playing?

fidlej commented 1 year ago

The original AlphaGo used chinese rules. The chinese rules were used also for Pachi.

Notice that Pachi is weak. We evaluated also versus different versions of AlphaZero (e.g., with the different number of simulations).