Closed wbrenton closed 1 year ago
On Go, with reward=-1 for a loss and reward=1 for a win, you can use:
qtransform=functools.partial(mctx.qtransform_by_min_max, min_value=-1.0, max_value=1.0)
I recommend to use the gumbel_muzero_policy
with the default qtransform. It works well on many environments.
Is there an example of using the
qtransform_by_min_max
? I'm unsure how to use it without entirely re-implementing themctx._src.policies.muzero_policy
.