Closed fqjin closed 5 years ago
The selfplay
function plays a single game using the network and mcts for move selection, then saves the boards and moves of the main line in a numpy npz
file. These files will be loaded in batches (e.g. of 10) for training.
I will first generate a few games using a fixed move order, as these games are more consistent and achieve better scores. Technically this breaks the zero-ness of the network. However, I do expect it to save some time.
Data augmentation was used in the previous keras version. It increases sample number by 8 and removes problems associated with class imbalance. It makes the network rotationally invariant, such that it can play games with large tiles stuck in any corner. However, it will make training slower because the network will take longer to learn a directional bias. If I train it on initial games with a fixed move order, then I cannot use rotational augmentation.
First ConvNet model was trained on these 10 games for 100 epochs. Model comparison shows stronger play compared to random. See 67b8c5bad42bfc7b534bd2de9bc9a632a6823bc5 for details.
Ideally, training should sample from a set of many games. This is achievable when the game generating process (batch MCTS) is accelerated (see #10 ).