Zeta36 / connect4-alpha-zero

Connect4 reinforcement learning by AlphaGo Zero methods.
MIT License
114 stars 38 forks source link
alphago-zero connect4 keras reinforcement-learning tensorflow

About

Connect4 reinforcement learning by AlphaGo Zero methods.

This project is based in two main resources: 1) DeepMind's Oct19th publication: Mastering the Game of Go without Human Knowledge. 2) The great Reversi development of the DeepMind ideas that @mokemokechicken did in his repo: https://github.com/mokemokechicken/reversi-alpha-zero

Environment

Modules

Reinforcement Learning

This AlphaGo Zero implementation consists of three worker self, opt and eval.

Evaluation

For evaluation, you can play chess with the BestModel.

Data

If you want to train the model from the beginning, delete the above directories.

How to use

Setup

install libraries

pip install -r requirements.txt

If you want use GPU,

pip install tensorflow-gpu

set environment variables

Create .env file and write this.

KERAS_BACKEND=tensorflow

Basic Usages

For training model, execute Self-Play, Trainer and Evaluator.

Self-Play

python src/connect4_zero/run.py self

When executed, Self-Play will start using BestModel. If the BestModel does not exist, new random model will be created and become BestModel.

options

Trainer

python src/connect4_zero/run.py opt

When executed, Training will start. A base model will be loaded from latest saved next-generation model. If not existed, BestModel is used. Trained model will be saved every 2000 steps(mini-batch) after epoch.

options

Evaluator

python src/connect4_zero/run.py eval

When executed, Evaluation will start. It evaluates BestModel and the latest next-generation model by playing about 200 games. If next-generation model wins, it becomes BestModel.

options

Play Game

python src/connect4_zero/run.py play_gui

When executed, ordinary chess board will be displayed in ASCII code and you can play against BestModel.

Tips and Memo

GPU Memory

Usually the lack of memory cause warnings, not error. If error happens, try to change per_process_gpu_memory_fraction in src/worker/{evaluate.py,optimize.py,self_play.py},

tf_util.set_session_config(per_process_gpu_memory_fraction=0.2)

Less batch_size will reduce memory usage of opt. Try to change TrainerConfig#batch_size in NormalConfig.

Model Performance

The following table is records of the best models.

best model generation winning percentage to best model Time Spent(hours) note
1 - -  
2 100% 1
3 84,6% 1
4 78,6% 2 This model is good enough to avoid naive losing movements
5 100% 1 The NN learns to play always in the center when it moves first
6 100% 4 The model now is able to win any online Connect4 game with classic AI I've found