About the result of self play training

NeymarL commented 6 years ago

Hello, I have implemented a Chinese chess AI based on this repo but my training result is really bad. After supervised learning for 10K games, it begins self-play. However, the self-played model became worse and worse. After a few hours of training, the original SL model can easily beat the self-played model. So I want to ask what's your performance of self-play training? Thank you.

TDteach commented 6 years ago

use the new-threads branch. I don't know which version you use, and there are some versions have bugs. So keep it updated.

On Tue, 20 Mar 2018, 4:56 PM He, notifications@github.com wrote:

Hello, I have implemented a Chinese chess AI https://github.com/NeymarL/ChineseChess-AlphaZero based on this repo but my training result is really bad. After supervised learning for 10K games, it begins self-play. However, the self-played model became worse and worse. After a few hours of training, the original SL model can easily beat the self-played model. So I want to ask what's your performance of self-play training? Thank you.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TDteach/AlphaZero_ChineseChess/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AJvhyl2MGCvEQiSdmhZcHJIsVEH1umqvks5tgMQ6gaJpZM4SxgfT .

NeymarL commented 6 years ago

I used the master branch, I will see the new-threads branch later. I'm really curious about your training performance and training time. Does it play better and better?

NeymarL commented 6 years ago

I have noticed that one of the major differences between the master branch and the new-threads branch is player_chess.py file. Why did you reimplement it in a new way, or are there some bugs in previous implementation of MCTS player? Thank you very much.

TDteach commented 6 years ago

The new-threads branch re-implements the multi-threads logic and makes it more efficient. Specifically, new-threads branch is two times faster than the original one. And the core of the MCTS used in new-threads branch is the same as the original one. The core of the MCTS is the upper-bound algorithm (function: select_action_q_and_u). To judge whether the performance is better than better, you can see the update-time of the "model_best_weight.h5" file. As the updating of this file means the new one will beat the old one with at least 55\% possibility. If the update-time is frozen, your model becomes perfect. While this situation has not emerged in my computer, using the same code and trained about 100h.

TDteach commented 6 years ago

The bug in the master branch mainly is the consumption of the PIPE resource. You can find that if I use 4 processes and every process further includes 10 threads, it will consume 40 pipes thus it needs 40 temp files in Linux system. Another consequence is that such many pipes will lead to the corruption of I/O and block the running, especially of the evaluation program (it consumes double pipes).

NeymarL commented 6 years ago

Thank you very much! One more thing, how long does it take to produce one self-play move in regards to simulation num, search threads, and hardware condition? My configuration is:

simulation num = 1000
search threads = 8
max processes = 10
GPU: Tesla K40m

and under this configuration, each process takes ~10 sec to produce a move. I find it's too slow to generate 440K games (as deep mind did in AlphaZero).

TDteach commented 6 years ago

simulation num = 2000 search threads = 8 max processes = 10 GPU: TITAN X max_game_leanth = 200 50 games in one file

The time to produce one file is about 30min. Right! To generate 440K games, I guess, needs about half of one year, if you only use one GPU. But, DeepMind has rooms of GPUs, thus they just need a few days.

More, I spend a week to generate about 40K games using 2 GPUs. So the new-threads branch is needed.

handsomeeeee commented 6 years ago

請問目前的棋力如何了呢不知道你們有沒有聽說目前圍棋AI leelazero項目是由一個老外寫的代碼然後由廣大網友一起用家中電腦訓練的ai 目前已經劍指alphago以外的最強ai了! 建議你們也可以採用這種方式在各大網站宣傳支持!!

TDteach commented 6 years ago

谢谢你的建议。不过这个repo暂时只是just for fun. 所以，当它成熟的那一天，我才会考虑它的“后事”

TDteach / AlphaZero_ChineseChess

About the result of self play training #1