UltTic!

Explanation:

This is a very simple experiment with the purpose of creating a Neural Network capable of playing TicTacToe perfectly. Contrary to the usually method where a neural network is evolved using reinforcement learning, it's been done using a very simple supervised learning feedforward dense network with a data set generated using the known MinMaxing algorithm. You can read more about that and the purpose and reasoning behind this project.

How to use, contribute and repository structure

Usage and contribution

Python:

To install and use the Python code (generator or keras_model) you will need to use Pipenv and to install the dependencies:

$ pipenv install

You can run the tests using

$ pipenv run test

Example of running keras_mode.py

# pipenv run python keras_model.py

Model:

If you just want to use the models in your own project you can copy them from the model folder, they are licensed under MIT like everything in the repository so feel free to use them in your own projects!

Site:

If you wish to contribute to the website all you have to do is run

$ yarn install
$ yarn run start

Feel free to open a PR or an issue about anything (yes, even questions are welcome! I am also still learning and probably got a lot of this wrong...)

Repository structure

└── data
    ├── perm_gen.py
    ├── test_perm_gen.py
    ├── train_inputs.pickle
    └── train_labels.pickle

The data folder contains the data set used to train the network and its generator perm_gen.py as well as test cases to ensure the generator is working as intended in test_perm_gen.py.

└── example
    └── tictactoe.py

Example folder contains a mostly identical copy of TicTacToe to the one shared here special thanks to both Billy Rebecchi and Horst Jens for writing and improving it. The file/folder itself has no purpose but could easily be updated to use the taught network. It was originally added at the initial phases of development as I was still figuring out how I planned to approach this, in the end only used the isGameOver function from it and slightly fixed (it didn't detect ties).

└── model
    ├── js
    |   ├── model.json
    |   ├── group1-shared1of1
    |   └── group2-shared1of1
    └── UltTic.h5

Contains the trained model generated by keras_model.py and its JavaScript equivalent generated using tensorflowjs.

└── site

The site directory contains a React web app where you can try your luck against the network. Its mostly a straight translation of one of my very first ReactNative projects where I decided to code a mobile TicTacToe (and yes it used MinMax which was rather slow and often hung the app at the beginning of the game)

Logic behind network design

At first I attempted teaching the model using only 8 neurons in the hidden layer (3 for winning horizontally, 3 for winning vertically and 2 for winning diagonally), that resulted in mostly poor results (maximum being ~83%)

I understood before starting that I was trying to make each neuron detect a “feature” of the data given but was unsure whether that would be 8 or 16, as 16 would be 8 winning “features” and 8 losing “features” but after taking more time to think as the network seemed not to improve all that much using 16 neurons I realized, if the neuron for diagonal 1 “fired” how would the network detect in which cell of the 3 cells of the diagonal should it play? The output layer has no access to the input layer! (There is definitely a different structure out there that connects every layer to all the previous ones, I still need to look into it)

That's when it hit me. Use 48 neurons! Because for each winning/losing method there are 3 cells. So each neuron detects winning/losing, cell number (1-3) and feature (where each feature is a diagonal or line). This would be 2 * 3 * 8, yes, 48! Perfect.

It was able to reach 90%+ after only few seconds of training, to the pride of its creator :heart:

Purpose and reasoning

As mentioned in the explanation the methodology used for training this Neural Network is rather different from conventional TicTacToe solving networks and that would be because:

Later the results of an identical design that is trained differently could be compared with this one, whether that be in
- Accuracy, could a reinforcement-taught Neural Network of same design reach 100% accuracy?
- Training time, the supervised method took approximately 3000 epochs over the data to reach 100% accuracy, would a reinforcement-taught NN be faster? Should we factor the time needed for the developer to prepare the data set for a supervised solution in such a comparison?
- Similarity, would a reinforcement-taught NN be identical to this one at 100% accuracy if taught against itself? What about the fact two plays are often completely equivalent? Should/Would a more differently taught network pick a different starting play each time to “confuse” players the same way humans play? (As far as my understanding goes this is unachievable without a recurrent network)
Dense feedforward networks could be called “Mathematical Maps/Dictionaries.” Looking at this Neural Network a question that would often probably be asked is “but why not index the data set and use it as map/dictionary for each unique board state?” By no means do I believe that this way is “better” than simply using a map but here is a consideration:
- If we compare the files train_labels and train_inputs in data/ we will immediately notice their size being around ~685KB while the whole model saved in model/UltTic.h5 is merely 17.9KB, not to mention the JavaScript version in model/js is merely 7KB. A Neural Network is “Mathematical Maps/Dictionaries” because it's not doing anything a normal map wouldn't do but it's much more efficient in size, for sure. Is it more efficient in processing power/time though? I doubt but sometimes processing power is a trivial problem while space is a core problem. For those situations an NN like this is a Mathematical map of really high size efficiency
Since I haven't seen anyone implement this, although I know it has been done while implementing more complex networks (chess playing networks learning by “watching” previously successful networks and algorithms) I wanted to experiment whether or not there is a flaw in the simple design I had in mind

Finally the reasoning behind the lack of a test data set is because I believe a network with limited set of input combinations (which are rare and usually exist only for trivial problems that we solved before NNs anyway) should simply be trained to “perfection” over all data over as many iterations as possible and be used as a Mathematical map, because that's what they'd be most useful for in those situation (Examples of other trivial cases that doesn't need a test set is solving XOR using a neural network)

Special thanks:

To all the people contributed in teaching me about Neural Networks:

:heart:

Mathspy / tic-tac-toe-NN

readme