ar-nowaczynski / utttai

AlphaZero-like AI solution for playing Ultimate Tic-Tac-Toe in the browser
https://uttt.ai
Apache License 2.0
55 stars 5 forks source link
ai alpha-zero browser onnxruntime ultimate-tic-tac-toe

uttt.ai

AlphaZero-like AI solution for playing Ultimate Tic-Tac-Toe in the browser.

uttt.ai preview

Introduction

This project is a loose adaptation of the original AlphaZero published by Deepmind. It follows key ideas behind the AlphaZero, such as generating training data from self-play or using single neural network to guide the Monte-Carlo Tree Search (MCTS) algorithm. However, the actual implementation of these ideas is different due to limitations on the available computing power, specifically:

You can play Ultimate Tic-Tac-Toe with the AI on the official project website: https://uttt.ai.

Overview

The project overview in chronological order:

overview

Differences from the original AlphaZero

Evaluation

The following evaluation procedure has been designed and then computed to assess the overall AI performance in Ultimate Tic-Tac-Toe.

Evaluation setup

Assessing the overall AI performance is done by playing tournaments between AIs. There are 3 AIs:

  1. MCTS - Monte-Carlo Tree Search with random playouts
  2. NMCTS1 - (Neural) Monte-Carlo Tree Search with Policy-Value Network guidance after training on stage1-mcts dataset
  3. NMCTS2 - (Neural) Monte-Carlo Tree Search with Policy-Value Network guidance after retraning on stage2-nmcts dataset

And 3 tournaments:

  1. NMCTS1 vs MCTS
  2. NMCTS2 vs MCTS
  3. NMCTS2 vs NMCTS1

Each AI is represented in two versions: quick and strong (to get more nuanced comparison).

AI version num simulations inference time
MCTS 1M (quick) 1,000,000 5.0s
MCTS 10M (strong) 10,000,000 51.4s
NMCTS 1k (quick) 1,000 4.4s
NMCTS 10k (strong) 10,000 17.8s

Inference times were measured on Intel i7-10700K and NVIDIA GeForce RTX 2080 Ti for single-threaded C++ implementations of MCTS and NMCTS (GPU is used by NMCTS to run Policy-Value Network inference).

Each tournament consist of 4 matches (all combinations of quick/strong vs quick/strong):
evaluation tournament template
Each match consist of 100 games initialized from 50 unique positions and each position is played twice (AIs swap sides for the second playthrough).

Initial evaluation positions are defined here: utttpy/selfplay/evaluation_uttt_states.py.

Evaluation results

evaluation results

Evaluation results aggregated

evaluation results aggregated

Takeaways:

NMCTS2 is deployed on the https://uttt.ai.

Datasets

There are 2 datasets:

Both datasets are available for download here.

Read more about datasets here: datasets/README.md.

Training

Training artifacts are available for download here.

Requirements

This project was developed using:

License

This project is licensed under the Apache License 2.0.