Sorry to bother you! I am making an alphazero implementation similar to yours, which is also for the Connect4 board game. The training went smooth at first, however, after 70+ iterations, the loss can no longer decrease. I manually set the learning rate from 1e-3 to 1e-5, but the loss still gradually increases. Then I came across your blog about your implementation, and I find it very similar to mine. Have you ever met this case in your experiments? Hopefully you could offer me some advice :)
Sorry to bother you! I am making an alphazero implementation similar to yours, which is also for the Connect4 board game. The training went smooth at first, however, after 70+ iterations, the loss can no longer decrease. I manually set the learning rate from 1e-3 to 1e-5, but the loss still gradually increases. Then I came across your blog about your implementation, and I find it very similar to mine. Have you ever met this case in your experiments? Hopefully you could offer me some advice :)