danzel commented 6 years ago

Have been playing with this, but not having too much luck getting it to work... Thoughts follow

Training data is garbage.

We are teaching the AI how bad it's currently known bad moves are. This probably isn't helpful? Maybe we can make the training data a bit differently, do a giant tree of all placements from the first piece and keep the best... Something like now but increasing the amount we keep at each level? This sounds sorta stupid... Maybe just keep more per level?

Network design

Try MaxPooling2D. Try inverting used/empty number. This changed things when I tested it, not totally sure why! More or less convolutional layers? (AlphaGo was 17ish iirc - I think we need more maybe) Dropout - Do we want it? Do we really care about overfitting? What sort of activations? I think AlphaGo used not relu? Maybe I'm remembering wrong. AGZ used residual layers instead of plain conv2d, https://keras.io/getting-started/functional-api-guide/ "Residual connection on a convolution layer"

Other thoughts

Want AI to place first piece in corners (I think), but it doesn't seem to want to do it... Is this because of the training data or because it's not a good move. Or the network layout sucks or... ?

Maybe instead of a class for each coverage (which could vary a lot based on what pieces you get) we group it to be 0-10% full, 11-20% .... 91-100% full. This may help learning? Probably need a non-linear scale, less accuracy at 0-50, more at 50-100 range.

danzel commented 6 years ago

Did some reading on (Deep)Q Learning and it sounds like something worth trying for generating training data. Run the NN to place pieces, randomly choose one of the placements and playout all of the alternatives. If any of the alternatives was better, teach the NN that it is better and the used move is worse. (Not an exact implementation of Q, but probably close enough)

danzel commented 6 years ago

Right now we only pass in the target board (with piece placed) for consideration. We should pass in the current and proposed board, otherwise the NN doesn't know which piece was placed.

danzel / PatchworkSim

Neural Network based placer #45

Training data is garbage.

Network design

Other thoughts