Right now the AZ implementation solves 3x3, 5x5, 7x7, 9x9 aaaaaand then stalls out short of perfect play on 11x11. Why? Presumably there's some bottleneck in the system somewhere.
Untested Hypotheses
Resid var anomaly: the resid var in the 11x11 runs is way higher (35%) than in the 9x9 runs. (10%) This suggests that either the value net is having trouble learning the true values, or that the policy is in such flux that it can't keep up.
This'll be easier to investigate when I've decoupled the value and policy nets. Then I can run supervised-learning tests on the trained nets.
LR size: I tested this while changing the architecture at the same time, which doesn't really count as a test
Fast nontransitivities: tested at 15 min and slower, but could be happening on a faster scale
Tested Hypotheses
Bad architecture: I know from the victory experiments that FC nets don't scale well to larger boards, which makes this a likely suspect. But my AZ runs with convnets on 11x11 have stalled out at the same level as the FC nets, suggesting it isn't architecture.
One lingering uncertainty here is whether the victory failures is the architecture's fault or just the initialization (as gwern suggested in EAI discord). I'm using rezero here, but I should do some more exhaustive experiments, interpolating from FC to convnet.
MCTS miscalibration: this was the problem last time when it worked up to 7x7 but then stalled on 9x9. But the mctselo tests suggest it's not a problem now: checking how various MCTS settings compete with a policy taken from late in the *forked-moves FC 11x11 run, my current setting of 1/16 c_puct and 64 nodes is still a substantial uplift:
LR size: *quiet-mats convnet run above was with 3e-4, compared to the usual 1e-2 and the plateau's the same. Did change the architecture from FC to convnet at the same time though so this isn't conclusive. I already have suspicions that convnets are much more sensitive to LR than FC is.
Nontransitivities: checked this for *forked-moves at the 15-min and 1-h scale. Could still be present at faster scales.
Right now the AZ implementation solves 3x3, 5x5, 7x7, 9x9 aaaaaand then stalls out short of perfect play on 11x11. Why? Presumably there's some bottleneck in the system somewhere.
Untested Hypotheses
Tested Hypotheses
victory
experiments that FC nets don't scale well to larger boards, which makes this a likely suspect. But my AZ runs with convnets on 11x11 have stalled out at the same level as the FC nets, suggesting it isn't architecture.One lingering uncertainty here is whether the victory failures is the architecture's fault or just the initialization (as gwern suggested in EAI discord). I'm using rezero here, but I should do some more exhaustive experiments, interpolating from FC to convnet.
mctselo
tests suggest it's not a problem now: checking how various MCTS settings compete with a policy taken from late in the*forked-moves
FC 11x11 run, my current setting of 1/16c_puct
and 64 nodes is still a substantial uplift:LR size:
*quiet-mats
convnet run above was with 3e-4, compared to the usual 1e-2 and the plateau's the same. Did change the architecture from FC to convnet at the same time though so this isn't conclusive. I already have suspicions that convnets are much more sensitive to LR than FC is.Nontransitivities: checked this for
*forked-moves
at the 15-min and 1-h scale. Could still be present at faster scales.