This repo contains implementation of a few Othello-AI agents and a UI to play against them. Currently, the following AI are available
The Front-end of the game is implemented in JS, the server in Flask + Python, and the AI and some of the game environment logic is implemented in C.
To use the different AI, dll files need to be placed in the repo directory (
Minimax,
MCTS,
Game Environment
)
Run python game_server.py
to start the UI
The game environment is available in multiple flavors in the file game_env.py
StateEnv
: Uses raw 2D matrix representation of the gameStateEnvBitBoard
: Uses bitboards to represent the game along with bit operations for the game logicStateEnvBitBoardC
: Same implementation as StateEnvBitBoard but in C for faster runtimeSimilarly, the AI are also available in multiple flavors in players.py
MiniMaxPlayer
: Implements Minimax tree search with alpha-beta pruning to improve runtime, uses difference in number of coins as board heuristicMiniMaxPlayerC
: Same implementation as above but in CMCTSPlayer
: Implements Monte Carlo Tree Search to use random rollouts for deciding the next best moveMCTSPlayerC
: Same implementation as above but in CMCTS Comparison of Python and C at 100 simulations per move
Game Environment Type | Player Type | Time |
---|---|---|
Python | Python with Python Environment | 0.16 it/s |
C | Python with C Environment | 0.33 it/s |
C | C | 16 it/s |
Minimax Comparison of Python and C at max depth of 3
Game Environment Type | Player Type | Time |
---|---|---|
Python | Python with Python Environment | 2 it/s |
C | Python with C Environment | 5 it/s |
C | C | 130 it/s |
C | C (max depth 4) | 43 it/s |
For the UI, the max depth of Minimax is capped at 9, while the number of simulations of MCTS is capped at 50000 to keep the AI move generation almost realtime.
A dqn version of the AI is also implemented in players.py under DeepQLearningAgent. To train this agent, run python training_dqn.py
.
This AI trains itself against a random player for set number of episodes using Convolutional Neural Network and Deep Q Learning.
The Game
class also provides the facility to record game play between two players using matplotlib
and FFmpeg
(git-2019-12-01-637742b).
from game_env import Game
from players import MCTSPlayerC, MiniMaxPlayerC, RandomPlayer
board_size = 8
# initialize classes
p1 = MCTSPlayerC(board_size=board_size, n_sim=50000)
p2 = MiniMaxPlayerC(board_size=board_size, depth=9)
g = Game(player1=p1, player2=p2, board_size=board_size)
# play and record
g.play()
g.record_gameplay(path='images/gameplay_mcts_minimax.mp4')
A stochastic method that uses simulations to determine the next best move to play. It is divided into 4 steps. Whenever we are given the task of choosing a move, we will initiate a fresh instance of MCTS.
The base idea of MCTS is to build a tree of game states by randomly choosing moves for either side. In this process, We may not be able to explore all the nodes (limited by total number of simulations), but for the nodes just after the root node (current state for which we have to choose the move) we expect to have a good idea of approximate win probabilities.
Select a node in the tree that is neither a leaf node, nor fullly explored.
Select one of the child node for this node which is unexplored. If all the child nodes have been explored, then we must be at the selection step itself. This child node represents one of the available leval moves. We play this move on the current state and get the updated board state variables.
On this new node added to the game tree in the exapansion step, start playing till the end by randomly choosing moves. In case the node is a terminal node, we skip this step. This constitutes one full run of the game starting from this new node.
One run of the game has ended and we update a couple of attributes of all the nodes (starting from the node added in the expansion phase, going all the way back up to the root node) that participated in this entire game run.