MCTS is always the wrong choice for 0/100 puzzles. If we see a 100-scoring solution, we'll take it. Otherwise, the score is 0, so all nodes in the tree have score 0 - but we still select through them and back-propagate the 0 score.
But, by doing MCTS, we suffer selection & back-propagation tests. On a recent test with C4, we gained a 10x raw speed improvement by doing depth-first min-max with alpha-beta pruning. (No select + back-prop. step. No building a tree in memory.) Obviously, in puzzles, the alpha-beta doesn't apply, but if we lack anything better to do, depth-first search is better than MCTS.
When I say "lack anything better to do", we'd obviously want to keep the special rollout policies that we have as part of the depth-first search.
Need to make sure that we're still evaluating latches
The 10x raw speed improvement was on a version which didn't build the tree. But that means we miss the benefit of transpositions - which may well prove to be more significant. A potential half-way house is to store the hash (only) of each state that has been searched so far. If we find that the current state has a hash that already matches, assume that there's no 100-scoring result below. (Fails in the face of hash collisions.)
MCTS is always the wrong choice for 0/100 puzzles. If we see a 100-scoring solution, we'll take it. Otherwise, the score is 0, so all nodes in the tree have score 0 - but we still select through them and back-propagate the 0 score.
But, by doing MCTS, we suffer selection & back-propagation tests. On a recent test with C4, we gained a 10x raw speed improvement by doing depth-first min-max with alpha-beta pruning. (No select + back-prop. step. No building a tree in memory.) Obviously, in puzzles, the alpha-beta doesn't apply, but if we lack anything better to do, depth-first search is better than MCTS.
When I say "lack anything better to do", we'd obviously want to keep the special rollout policies that we have as part of the depth-first search.