Zeta36 / chess-alpha-zero

Chess reinforcement learning by AlphaGo Zero methods.
MIT License
2.12k stars 479 forks source link

Interpret game result #2

Open yhyu13 opened 6 years ago

yhyu13 commented 6 years ago

@Zeta36

I've played around a bit with the mini model, it takes quite a long time for each game (is it normal, because I saw a logging says it was loading the normal model but I typed --type mini):

2017-11-19 23:12:48,465@chess_zero.worker.self_play DEBUG # game 1 time=150.61661911010742 sec, turn=163:1n3kb1/8/8/pp4P1/2nN4/7r/5p2/2K5 b - - 1 82 - Winner:Winner.black - by resignation?:True
2017-11-19 23:14:49,662@chess_zero.worker.self_play DEBUG # game 2 time=121.1972291469574 sec, turn=120:1n4r1/1p4k1/pq2Rp2/1P3n1P/2BP4/P1P2bP1/3P4/2B1R1K1 w - - 9 61 - Winner:Winner.draw - by resignation?:False
2017-11-19 23:15:08,604@chess_zero.worker.self_play DEBUG # game 3 time=18.94098925590515 sec, turn=21:rn1qkbnr/4pp2/pp1p3p/3p2p1/3P1Pb1/8/PPPKPBPP/RN3B1R b kq - 1 11 - Winner:Winner.black - by resignation?:True
2017-11-19 23:16:24,316@chess_zero.worker.self_play DEBUG # game 4 time=75.7124376296997 sec, turn=81:4k2r/3p1q2/2pB3p/5p1R/r2n2P1/P7/3K1P2/8 b k - 1 41 - Winner:Winner.black - by resignation?:True

And, my question is how to interpret the game result? Thanks!

Keep up the good work! I am looking forward to see a visualization of game result?

EDIT: I believe it's running the normal model even though I set it to be mini

Zeta36 commented 6 years ago

If you go here: https://lichess.org/analysis/standard and paste the movement of the end game (for example: rn1qkbnr/4pp2/pp1p3p/3p2p1/3P1Pb1/8/PPPKPBPP/RN3B1R b kq - 1 11) in the input field FEN of the web you will see a good looking board with the state of that (self)game when it finished.

The game result says us who is the winner and if it was a normal end game (checkmate, stalemate, etc.) or if it was cut-off game due to resignation. Resignation occurs when a player has a big advance of pieces about the other (more than 13 points of difference in the standard pieces value: where queens is worth 10, root 5.5, etc.).

EDIT: I believe it's running the normal model even though I set it to be mini

If you want to be sure about this you can just do a debug and break in the self-play.py worker in line 32. You have there the config class used. Evaluate the value of the "simulation_num_per_move" propriety. If it says it's 10 is the mini toy model, if it says 100 you are in the normal one.

Anyway if you did "python run.py self --type mini" you should be in the mini version. You can anyway debug the manager.py file and see what's the config file it load.

I am looking forward to see a visualization of game result?

Thank you very much @yhyu13. I know I have to get a good GPU but I'm right now lacking money.

Regards!!

yhyu13 commented 6 years ago

@Zeta36

Thanks for you detailed reply, the chess board visualization is beautiful. I ran a profile to your algorithm and I found it astonishing:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   17.461   17.461 player_chess.py:82(search_moves)
        1    0.001    0.001   17.460   17.460 {method 'run_until_complete' of 'uvloop.loop.Loop' objects}
      302    0.001    0.000   16.942    0.056 player_chess.py:98(start_search_my_move)
  690/218    0.007    0.000   16.938    0.078 player_chess.py:106(search_my_move)
      233    0.113    0.000   16.667    0.072 player_chess.py:228(select_action_q_and_u)
      233    1.550    0.007   16.387    0.070 player_chess.py:232(<listcomp>)
  1894057    2.973    0.000    8.742    0.000 __init__.py:490(from_uci)
  1893824    0.698    0.000    6.097    0.000 __init__.py:3273(__contains__)
  1894057    0.973    0.000    5.405    0.000 __init__.py:1551(is_legal)
  4742482    4.768    0.000    4.768    0.000 {method 'index' of 'list' objects}
  1894057    1.805    0.000    4.216    0.000 __init__.py:1503(is_pseudo_legal)
  1958549    0.753    0.000    0.753    0.000 __init__.py:615(piece_type_at)
   158450    0.567    0.000    0.751    0.000 __init__.py:1256(generate_pseudo_legal_moves)
  1902996    0.655    0.000    0.655    0.000 __init__.py:425(__init__)
  1899679    0.650    0.000    0.650    0.000 __init__.py:449(__bool__)
       11    0.000    0.000    0.517    0.047 player_chess.py:173(prediction_worker)
        8    0.000    0.000    0.516    0.065 api_chess.py:9(predict)
        8    0.000    0.000    0.516    0.065 training.py:1879(predict_on_batch)
        8    0.000    0.000    0.515    0.064 tensorflow_backend.py:2338(__call__)
        8    0.000    0.000    0.515    0.064 session.py:781(run)
        8    0.000    0.000    0.515    0.064 session.py:1036(_run)
        8    0.000    0.000    0.512    0.064 session.py:1258(_do_run)
        8    0.000    0.000    0.512    0.064 session.py:1321(_do_call)
        8    0.000    0.000    0.512    0.064 session.py:1290(_run_fn)
        8    0.505    0.063    0.505    0.063 {built-in method _pywrap_tensorflow_internal.TF_Run}
  4743887    0.349    0.000    0.349    0.000 {built-in method builtins.len}
    14912    0.084    0.000    0.205    0.000 __init__.py:3078(generate_castling_moves)
  1004964    0.168    0.000    0.190    0.000 __init__.py:214(scan_reversed)
  1911511    0.171    0.000    0.171    0.000 __init__.py:1554(is_variant_end)
      233    0.001    0.000    0.161    0.001 chess_env.py:37(step)
      233    0.013    0.000    0.140    0.001 __init__.py:1759(can_claim_threefold_repetition)
      766    0.002    0.000    0.129    0.000 chess_env.py:135(replace_tags)
      766    0.002    0.000    0.125    0.000 __init__.py:2008(fen)
      766    0.003    0.000    0.123    0.000 __init__.py:2252(epd)
       99    0.121    0.001    0.121    0.001 {method 'dirichlet' of 'mtrand.RandomState' objects}
      666    0.001    0.000    0.114    0.000 player_chess.py:224(counter_key)
      766    0.032    0.000    0.096    0.000 __init__.py:719(board_fen)
    91774    0.083    0.000    0.083    0.000 __init__.py:1385(attacks_mask)
    20282    0.010    0.000    0.078    0.000 {built-in method builtins.any}
    14912    0.009    0.000    0.069    0.000 __init__.py:3068(_attacked_for_king)

The bottleneck is that you are trying to check a legal move or not on the fly, it works for reversi. I tried it in Go, it's too expensive, not to mention over 8000 labels in Chess.

I've employed the strategy in Go where an illegal move results in leaf_v = -1. Does it make sense in your set up?

Zeta36 commented 6 years ago

@yhyu13, I really appreciate your interest in the project. Please no hesitate in pull request any change you want to do in the project, or even if you want to be a collaborator just tell me :).

Regards!

Zeta36 commented 6 years ago

By the way,@yhyu13 . All those huge number of calls , the bigger ones in your visualization are due to internal calls of the python-chess library. It seems it's very expensive to check the legal moves in a board state:

1894057 2.973 0.000 8.742 0.000 init.py:490(from_uci) 1893824 0.698 0.000 6.097 0.000 init.py:3273(contains) 1894057 0.973 0.000 5.405 0.000 init.py:1551(is_legal) 4742482 4.768 0.000 4.768 0.000 {method 'index' of 'list' objects} 1894057 1.805 0.000 4.216 0.000 init.py:1503(is_pseudo_legal) 1958549 0.753 0.000 0.753 0.000 init.py:615(piece_type_at) 158450 0.567 0.000 0.751 0.000 init.py:1256(generate_pseudo_legal_moves) 1902996 0.655 0.000 0.655 0.000 init.py:425(init) 1899679 0.650 0.000 0.650 0.000 init.py:449(bool)

and it seems it's also expensive to check game over:

14912    0.084    0.000    0.205    0.000 __init__.py:3078(generate_castling_moves)

1004964 0.168 0.000 0.190 0.000 init.py:214(scan_reversed) 1911511 0.171 0.000 0.171 0.000 init.py:1554(is_variant_end)

I think this issue is due to the rules of chess that are much more complex than those of Go or Reversi. In check you can end a game in a lot of game (checkmate, stalemate, etc.) and you have to check a lot of rules to detect legal moves in a board state.

I don't know if python-chess is a poorly developed framework or if this is a intrinsic computational cost of the chess rules.

Can't you try to train with those cost? I mean, it's impossible to train the model in this way?

yhyu13 commented 6 years ago

@Zeta36

EDIT: Please review pull request

It seems like the bottleneck is the performance of a single cpu. With the fact that it takes about 15 times slower to generate data, I believe we will first lose patience before it actually learn something. I will see if it's possible to walk around the legal move. Done!

Here could be optimized as such:

def __init__(*args,**kwargs):
...
self.move_lookup = {k:v for k,v  in zip((chess.Move.from_uci(mov) for mov in self.config.labels),range(len(self.config.labels)))}
def select_action_q_and_u(*args,**kwargs):
...
legal_moves = [self.move_lookup[move] for move in env.board.legal_moves]
legal_labels = np.zeros(len(self.config.labels)
logger.debug(legal_moves)
legal_labels[legal_moves] = 1

Here is the brand new profiled result (10x faster,100simulation 1.357s):

*** PROFILER RESULTS ***
expand_and_evaluate (src/chess_zero/agent/player_chess.py:157)
function called 100 times

         0 function calls in 0.000 seconds

   Ordered by: cumulative time, internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        0    0.000             0.000          profile:0(profiler)

*** PROFILER RESULTS ***
search_moves (src/chess_zero/agent/player_chess.py:84)
function called 1 times

         555528 function calls (554819 primitive calls) in 1.357 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 473 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.357    1.357 player_chess.py:84(search_moves)
        1    0.001    0.001    1.356    1.356 {method 'run_until_complete' of 'uvloop.loop.Loop' objects}
       11    0.000    0.000    0.856    0.078 player_chess.py:180(prediction_worker)
       10    0.000    0.000    0.855    0.086 api_chess.py:9(predict)
       10    0.000    0.000    0.855    0.086 training.py:1879(predict_on_batch)
       10    0.000    0.000    0.854    0.085 tensorflow_backend.py:2338(__call__)
       10    0.000    0.000    0.853    0.085 session.py:781(run)
       10    0.000    0.000    0.853    0.085 session.py:1036(_run)
       10    0.000    0.000    0.850    0.085 session.py:1258(_do_run)
       10    0.000    0.000    0.849    0.085 session.py:1321(_do_call)
       10    0.000    0.000    0.849    0.085 session.py:1290(_run_fn)
       10    0.815    0.082    0.815    0.082 {built-in method _pywrap_tensorflow_internal.TF_Run}
      333    0.001    0.000    0.498    0.001 player_chess.py:101(start_search_my_move)
  762/249    0.004    0.000    0.495    0.002 player_chess.py:109(search_my_move)
      224    0.019    0.000    0.229    0.001 player_chess.py:235(select_action_q_and_u)
      224    0.001    0.000    0.160    0.001 chess_env.py:37(step)
      224    0.012    0.000    0.139    0.001 __init__.py:1759(can_claim_threefold_repetition)
      748    0.002    0.000    0.129    0.000 chess_env.py:135(replace_tags)
      748    0.002    0.000    0.125    0.000 __init__.py:2008(fen)
      748    0.003    0.000    0.123    0.000 __init__.py:2252(epd)
       99    0.121    0.001    0.121    0.001 {method 'dirichlet' of 'mtrand.RandomState' objects}
      648    0.001    0.000    0.114    0.000 player_chess.py:231(counter_key)
      748    0.033    0.000    0.097    0.000 __init__.py:719(board_fen)
     5064    0.030    0.000    0.068    0.000 __init__.py:1802(push)
    10014    0.008    0.000    0.067    0.000 __init__.py:3034(generate_legal_moves)
    47872    0.029    0.000    0.051    0.000 __init__.py:607(piece_at)
    10344    0.020    0.000    0.048    0.000 __init__.py:1256(generate_pseudo_legal_moves)
      200    0.001    0.000    0.045    0.000 player_chess.py:157(expand_and_evaluate)
      224    0.005    0.000    0.043    0.000 player_chess.py:239(<listcomp>)
       10    0.000    0.000    0.034    0.003 session.py:1338(_extend_graph)
        1    0.033    0.033    0.033    0.033 {built-in method _pywrap_tensorflow_internal.TF_ExtendGraph}
      100    0.000    0.000    0.024    0.000 chess_env.py:125(black_and_white_plane)
      748    0.009    0.000    0.022    0.000 __init__.py:1971(castling_xfen)
    63288    0.022    0.000    0.022    0.000 __init__.py:615(piece_type_at)
     5631    0.003    0.000    0.020    0.000 {built-in method builtins.any}
    43166    0.013    0.000    0.015    0.000 __init__.py:214(scan_reversed)
     5101    0.006    0.000    0.015    0.000 __init__.py:3148(_transposition_key)
    10128    0.011    0.000    0.015    0.000 __init__.py:646(_remove_piece_at)
      224    0.000    0.000    0.013    0.000 __init__.py:2650(push_uci)
     4840    0.010    0.000    0.011    0.000 __init__.py:1918(pop)
Zeta36 commented 6 years ago

Thank you very much for your effort, @yhyu13. I really appreciate it.

I'm looking forward to see if our approach is able to perform a good chess player (maybe not a master but at least an amateur).

I'm going to add you as collaborator so you can push anything you want without asking me for a pull request.

Regards!!

Zeta36 commented 6 years ago

Hello, @yhyu13 .

I've done today a new version of the Reversi Zero project. This time I adapted it to the game Connect4: https://github.com/Zeta36/connect4-alpha-zero

I'm really in love with the implementation of @mokemokechicken. He did it (and DeepMind thought it) in a way that I can apply it easily to any new environment I imagine.

Moreover, Connect4 is a more easy game and I could train the model without GPU. Results are amazing. The model is able to learn to play well in only 3 generations in a couple of hours (just with a Intel i5 CPU).

It's a pitty I don't have enough power machine to check if the chess version is able to learn to play well.

yhyu13 commented 6 years ago

@Zeta36

Good work! It reminds me another github project I came across called mini-alphaGo in playing the game of connect4. It looks way messier than your implementation. You gotta thank @mokemokechicken a lot. The software framework incorporated by him/her is neat, with the help of Keras as a model wrapper.

To my knowledge, backgammon, chess, go, and draughts are the biggest four abstract strategy board game. The first three all have been "solved" by having computer program plays better than any human players. I have not heard anything good from draughts, though. It would be great if you can move onto this game after connect4. A quick search leads me to: https://github.com/codeofcarson/Checkers. Take a look if you are interested in it.

I will be absent during thanksgiving break. But I set up a script that runs/restarts your chess zero automatically. I apologize for not setting up a server so that you can take a look at the result. It should be good after I go back home.

Regards.

yhyu13 commented 6 years ago

@Zeta36

I've created about 3.5GB game play data this Thanksgiving break. But I don't know where to find evaluation log? I will upload the best model to google drive so you can give it a try. I hope we are able to get a decent chess algo.

https://drive.google.com/drive/folders/1KNTggmQhp4E4MqZiCPhFqvPfff6MrMYz?usp=sharing

The architecture is 256 filters 7 layers, others remain the same.

Let me know if you have any trouble.

Regards!

Zeta36 commented 6 years ago

I played against your best model but the results were not good :(.

C:\Users\Samu\Anaconda3\python.exe "ML 2017/chess-alpha-zero-git/src/chess_zero/run.py" play_gui 2017-11-26 23:08:15,087@chess_zero.manager INFO # config type: normal Using TensorFlow backend. 2017-11-26 23:08:46,180@chess_zero.agent.model_chess DEBUG # loading model from ML 2017\chess-alpha-zero-git\data\model\model_best_config.json 2017-11-26 23:08:52,563@chess_zero.agent.model_chess DEBUG # loaded model digest = 4aa6e5358d339f13f388d5e6eb00827bafd43d3073f248b634b041f6f8cc9513 2017-11-26 23:08:52,620@asyncio DEBUG # Using selector: SelectSelector 2017-11-26 23:09:04,007@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(2, 63), value move=(1, 72) IA moves to: g2g3

r n b q k b n r p p p p p p p p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P . P P P P P P . P R N B Q K B N R

Board fen = rnbqkbnr/pppppppp/8/8/8/6P1/PPPPPP1P/RNBQKBNR b KQkq - 0 1

Enter your movement in UCI format(a1a2, b2b6,...): e7e5 You move to: e7e5

r n b q k b n r p p p p . p p p . . . . . . . . . . . . p . . . . . . . . . . . . . . . . . P . P P P P P P . P R N B Q K B N R

Board fen = rnbqkbnr/pppp1ppp/8/4p3/8/6P1/PPPPPP1P/RNBQKBNR w KQkq - 0 2 IA moves to: g1h3

r n b q k b n r p p p p . p p p . . . . . . . . . . . . p . . . . . . . . . . . . . . . . . P N P P P P P P . P R N B Q K B . R

Board fen = rnbqkbnr/pppp1ppp/8/4p3/8/6PN/PPPPPP1P/RNBQKB1R b KQkq - 1 2

Enter your movement in UCI format(a1a2, b2b6,...): g8f6 You move to: g8f6

r n b q k b . r p p p p . p p p . . . . . n . . . . . . p . . . . . . . . . . . . . . . . . P N P P P P P P . P R N B Q K B . R

Board fen = rnbqkb1r/pppp1ppp/5n2/4p3/8/6PN/PPPPPP1P/RNBQKB1R w KQkq - 2 3 2017-11-26 23:10:14,659@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 200), value move=(0, 400) 2017-11-26 23:10:27,093@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 200), value move=(0, 8) IA moves to: d2d4

r n b q k b . r p p p p . p p p . . . . . n . . . . . . p . . . . . . P . . . . . . . . . . P N P P P . P P . P R N B Q K B . R

Board fen = rnbqkb1r/pppp1ppp/5n2/4p3/3P4/6PN/PPP1PP1P/RNBQKB1R b KQkq - 0 3

Enter your movement in UCI format(a1a2, b2b6,...): e5d4 You move to: e5d4

r n b q k b . r p p p p . p p p . . . . . n . . . . . . . . . . . . . p . . . . . . . . . . P N P P P . P P . P R N B Q K B . R

Board fen = rnbqkb1r/pppp1ppp/5n2/8/3p4/6PN/PPP1PP1P/RNBQKB1R w KQkq - 0 4 2017-11-26 23:11:09,755@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(2, 192), value move=(0, 192) 2017-11-26 23:11:21,378@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(2, 192), value move=(1, 65) IA moves to: d1d4

r n b q k b . r p p p p . p p p . . . . . n . . . . . . . . . . . . . Q . . . . . . . . . . P N P P P . P P . P R N B . K B . R

Board fen = rnbqkb1r/pppp1ppp/5n2/8/3Q4/6PN/PPP1PP1P/RNB1KB1R b KQkq - 0 4

Enter your movement in UCI format(a1a2, b2b6,...): b8c6 You move to: b8c6

r . b q k b . r p p p p . p p p . . n . . n . . . . . . . . . . . . . Q . . . . . . . . . . P N P P P . P P . P R N B . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n2n2/8/3Q4/6PN/PPP1PP1P/RNB1KB1R w KQkq - 1 5 2017-11-26 23:12:14,997@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(6, 215), value move=(7, 216) IA moves to: d4e4

r . b q k b . r p p p p . p p p . . n . . n . . . . . . . . . . . . . . Q . . . . . . . . . P N P P P . P P . P R N B . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n2n2/8/4Q3/6PN/PPP1PP1P/RNB1KB1R b KQkq - 2 5

Enter your movement in UCI format(a1a2, b2b6,...): f6e4 You move to: f6e4

r . b q k b . r p p p p . p p p . . n . . . . . . . . . . . . . . . . . n . . . . . . . . . P N P P P . P P . P R N B . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n5/8/4n3/6PN/PPP1PP1P/RNB1KB1R w KQkq - 0 6 IA moves to: c1e3

r . b q k b . r p p p p . p p p . . n . . . . . . . . . . . . . . . . . n . . . . . . . B . P N P P P . P P . P R N . . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n5/8/4n3/4B1PN/PPP1PP1P/RN2KB1R b KQkq - 1 6

Enter your movement in UCI format(a1a2, b2b6,...): e4f6 You move to: e4f6

r . b q k b . r p p p p . p p p . . n . . n . . . . . . . . . . . . . . . . . . . . . . B . P N P P P . P P . P R N . . K B . R

Board fen = r1bqkb1r/pppp1ppp/2n2n2/8/8/4B1PN/PPP1PP1P/RN2KB1R w KQkq - 2 7 2017-11-26 23:14:12,397@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(6, 269), value move=(1, 65) 2017-11-26 23:14:24,565@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 65), value move=(0, 72) 2017-11-26 23:14:35,581@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 65), value move=(2, 63) 2017-11-26 23:14:46,832@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 65), value move=(0, 255) IA moves to: e1d1

r . b q k b . r p p p p . p p p . . n . . n . . . . . . . . . . . . . . . . . . . . . . B . P N P P P . P P . P R N . K . B . R

Board fen = r1bqkb1r/pppp1ppp/2n2n2/8/8/4B1PN/PPP1PP1P/RN1K1B1R b kq - 3 7

Enter your movement in UCI format(a1a2, b2b6,...): d7d6 You move to: d7d6

r . b q k b . r p p p . . p p p . . n p . n . . . . . . . . . . . . . . . . . . . . . . B . P N P P P . P P . P R N . K . B . R

Board fen = r1bqkb1r/ppp2ppp/2np1n2/8/8/4B1PN/PPP1PP1P/RN1K1B1R w kq - 0 8 IA moves to: b1c3

r . b q k b . r p p p . . p p p . . n p . n . . . . . . . . . . . . . . . . . . . . N . B . P N P P P . P P . P R . . K . B . R

Board fen = r1bqkb1r/ppp2ppp/2np1n2/8/8/2N1B1PN/PPP1PP1P/R2K1B1R b kq - 1 8

Enter your movement in UCI format(a1a2, b2b6,...): c8h3 You move to: c8h3

r . . q k b . r p p p . . p p p . . n p . n . . . . . . . . . . . . . . . . . . . . N . B . P b P P P . P P . P R . . K . B . R

Board fen = r2qkb1r/ppp2ppp/2np1n2/8/8/2N1B1Pb/PPP1PP1P/R2K1B1R w kq - 0 9 2017-11-26 23:16:58,419@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 145), value move=(7, 270) 2017-11-26 23:17:10,792@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(1, 145), value move=(7, 0) 2017-11-26 23:17:21,747@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(7, 0), value move=(0, 191) 2017-11-26 23:17:32,833@chess_zero.agent.player_chess DEBUG # continue thinking: policy move=(0, 146), value move=(1, 72) IA moves to: b2b4

r . . q k b . r p p p . . p p p . . n p . n . . . . . . . . . . . P . . . . . . . . N . B . P b P . P . P P . P R . . K . B . R

Board fen = r2qkb1r/ppp2ppp/2np1n2/8/1P6/2N1B1Pb/P1P1PP1P/R2K1B1R b kq - 0 9

Enter your movement in UCI format(a1a2, b2b6,...):

You can check by yourself the result by playing with: "play_gui" option.

Here you can see the last state of the board: https://www.chess.com/dynboard?fen=r2qkb1r/ppp2ppp/2np1n2/8/1P6/2N1B1Pb/P1P1PP1P/R2K1B1R%20b%20kq%20b3%200%209&board=green&piece=neo&size=3

NN plays white. At first it looked more or less fine when model gets the pawn in 3nth move, but then NN loses its queen :( (although it seemed it wanted to escape).

I don't know if this is a good enough result for the time your spent in the training until now. What do you think?

@yhyu13 what was the loss the optimization worker showed to you? and how many times the evaluator worker changed its best model after the tournament? I mean, you said you executed the self-play worker a lot of time but, can you tell me what was what the other two simultaneous workers showed you in the console in all this time?

yhyu13 commented 6 years ago

@Zeta36

I just noticed that the self-play pipeline is manual. 😂 I am trying to train it with opt mode but 1) the total loss blows up to NaN but neither policy loss or value loss does. I assume that the total loss has include weight decay though I haven't where it explicitly declare here 2) there is no stopping criteria for training? here

I am not sure I understand how does the self-play pipeline go at this moment.

Regards

Zeta36 commented 6 years ago

@yhyu13, I'm afraid you did not follow the correct way to train the model :P.

You have to run "at the same time" the three workers: self-play, opt, and eval. Indeed It's easy. You just run the run.py script three times in three different consoles (terminals) for example. You just do something like:

python run.py self
python run.py opt
python run.py eval

You will have to delete all the self-play done until now :( and start from scratch.

Self-play will start with a random best model and will generate games. After a wile (a fixed number of games), the optimization process will start (opt) and eventually it will create a next_generation (ng) model that the worker evaluator (eval) will detect. The evaluator will then make best model and ng model to play one against other. If ng wins more than 55% it will become the best model and so on.

Indeed, AlphaGo Zero idea it's pretty similar to an evolutionary algorithm with selection by tournament.

Regards, friend.

yhyu13 commented 6 years ago

@Zeta36

I have to apologize that I didn't notice it earlier. Since my computer (gaming laptop) is not a dedicated server, I can't promise your that these programmes will run without throwing error (the most common one is CUDA core dump) and abort in the middle. According to my schedule, I can run them for this week. If I get any good result, I will open an new issue and let you know immediately.

Regards

Zeta36 commented 6 years ago

Perfect!! I really thank you for your help :).

dklausa commented 6 years ago

@yhyu13

Just some input as an avid game player. Funny you should mention draughts as the one game among the big 4 not yet "solved". Actually, to the best of my knowledge it was the first of those in which humans became thoroughly outclassed by a program, namely Chinook. At least with Kasparov and Deep Blue it was close. Good backgammon players can still win often against eXtreme Gammon, due to the luck element. Perhaps the greatest distance between the best human and the best program in a prominent board game is in Othello, with the program WZebra. Also, none of them are technically solved, which implies that a complete game tree is documented, with proof of best move in every possible situation.