AlphaZero-like AI solution for playing Ultimate Tic-Tac-Toe in the browser.
This project is a loose adaptation of the original AlphaZero published by Deepmind. It follows key ideas behind the AlphaZero, such as generating training data from self-play or using single neural network to guide the Monte-Carlo Tree Search (MCTS) algorithm. However, the actual implementation of these ideas is different due to limitations on the available computing power, specifically:
You can play Ultimate Tic-Tac-Toe with the AI on the official project website: https://uttt.ai.
The project overview in chronological order:
The following evaluation procedure has been designed and then computed to assess the overall AI performance in Ultimate Tic-Tac-Toe.
Assessing the overall AI performance is done by playing tournaments between AIs. There are 3 AIs:
And 3 tournaments:
Each AI is represented in two versions: quick and strong (to get more nuanced comparison).
AI | version | num simulations | inference time |
---|---|---|---|
MCTS | 1M (quick) | 1,000,000 | 5.0s |
MCTS | 10M (strong) | 10,000,000 | 51.4s |
NMCTS | 1k (quick) | 1,000 | 4.4s |
NMCTS | 10k (strong) | 10,000 | 17.8s |
Inference times were measured on Intel i7-10700K and NVIDIA GeForce RTX 2080 Ti for single-threaded C++ implementations of MCTS and NMCTS (GPU is used by NMCTS to run Policy-Value Network inference).
Each tournament consist of 4 matches (all combinations of quick/strong vs quick/strong):
Each match consist of 100 games initialized from 50 unique positions and each position is played twice (AIs swap sides for the second playthrough).
Initial evaluation positions are defined here: utttpy/selfplay/evaluation_uttt_states.py.
Takeaways:
NMCTS2 is deployed on the https://uttt.ai.
There are 2 datasets:
stage1-mcts
: 8 mln evaluated positions generated by the Monte-Carlo Tree Search self-play and used to train the Policy-Value Network from scratch.stage2-nmcts
: 8 mln evaluated positions generated by the Neural Monte-Carlo Tree Search self-play and used to retrain the Policy-Value Network.Both datasets are available for download here.
Read more about datasets here: datasets/README.md.
Training artifacts are available for download here.
This project was developed using:
This project is licensed under the Apache License 2.0.