PFCM / 482-project

0 stars 0 forks source link

uct #10

Closed PFCM closed 8 years ago

PFCM commented 8 years ago

Figure out when to stop using the tree policy and start the rollout.

Silver's 2011 paper has rollouts every time we add a node (and then just keep searching). This seems like it should be better than what happens at the moment as it will always be getting more accurate (closer to the bottom) and may indeed be essential for convergence.

PFCM commented 8 years ago

Also note that currently the fact that rewards are inverted at each step during the backup is not reflected in the tree policy. I think this is what should be happening -- that way UCB can still take the max at each stage and it should eventually become some kind of minimax.

PFCM commented 8 years ago

actually starting to think that it might be in the reward calculation & hence specific to hex, need to double check by playing some go