google-deepmind / mctx

Monte Carlo tree search in JAX
Apache License 2.0
2.33k stars 188 forks source link

qtransform_by_min_max example #49

Closed wbrenton closed 1 year ago

wbrenton commented 1 year ago

Is there an example of using the qtransform_by_min_max? I'm unsure how to use it without entirely re-implementing the mctx._src.policies.muzero_policy.

fidlej commented 1 year ago

On Go, with reward=-1 for a loss and reward=1 for a win, you can use:

qtransform=functools.partial(mctx.qtransform_by_min_max, min_value=-1.0, max_value=1.0)

I recommend to use the gumbel_muzero_policy with the default qtransform. It works well on many environments.