Rewrite training for pytorch

fqjin / 2048NN

Train a neural network to play 2048

GNU General Public License v3.0

1 stars 1 forks source link

Rewrite training for pytorch #11

Closed fqjin closed 5 years ago

fqjin commented 5 years ago

Ideally, training should sample from a set of many games. This is achievable when the game generating process (batch MCTS) is accelerated (see #10 ).

fqjin commented 5 years ago

The selfplay function plays a single game using the network and mcts for move selection, then saves the boards and moves of the main line in a numpy npz file. These files will be loaded in batches (e.g. of 10) for training.

fqjin commented 5 years ago

I will first generate a few games using a fixed move order, as these games are more consistent and achieve better scores. Technically this breaks the zero-ness of the network. However, I do expect it to save some time.

fqjin commented 5 years ago

Data augmentation was used in the previous keras version. It increases sample number by 8 and removes problems associated with class imbalance. It makes the network rotationally invariant, such that it can play games with large tiles stuck in any corner. However, it will make training slower because the network will take longer to learn a directional bias. If I train it on initial games with a fixed move order, then I cannot use rotational augmentation.

fqjin commented 5 years ago

First ConvNet model was trained on these 10 games for 100 epochs. Model comparison shows stronger play compared to random. See 67b8c5bad42bfc7b534bd2de9bc9a632a6823bc5 for details.