Zeta36 / chess-alpha-zero

Chess reinforcement learning by AlphaGo Zero methods.
MIT License
2.12k stars 479 forks source link

First "good" results #13

Open Zeta36 opened 6 years ago

Zeta36 commented 6 years ago

Using the new supervised learning step I created, I've been able to train a model to the point that seems to be learning the the openings of chess. Also it seems it starts to avoid losing naively pieces.

Here you can see an example of a game played for me against this model (AI plays black):

partida1

This model plays in this way after only 5 epoch iterations of the 'opt' worker, the 'eval' worker changed 4 times the best model (4 of 5). At this moment the loss of the 'opt' worker is 5.1 (and still seems to be converging very well).

As I have not GPU, I had to evaluate ('eval') using only "self.simulation_num_per_move = 10" and only 10 files of play data for the 'opt' worker. I'm pretty sure if anybody is able to run in a good GPU with a more powerful configuration the results after complete convergence would be really good.

bame55 commented 6 years ago

Hi Zeta36!

I, too, wondered whether the techniques that created AlphaGo Zero could be applied to chess. To be honest I was sceptical, but decided to do an experiment anyway. Fortunately, before I started putting together my own implementation, I stumbled across yours.

My attempts to train from scratch a couple of days ago (no supervised learning) did not go very well, so I downloaded lots of games from KingBase (http://www.kingbase-chess.net/), extracted 15000 random games where the result was not a draw and one or both players had ELO >= 2400, and ran "sl" on them. After that I started "self", "opt", and "eval" and let the whole thing run for 24 hours. Eval changed the best model almost every iteration (7 times in all), although many "next generation" models were not evaluated because "opt" runs so much faster than "eval" on my machine (due to gpu).

In any case I checked the "opt" log this afternoon. Results at the time were: loss: 1.2003 - policy_out_loss: 0.4032 - value_out_loss: 0.1931

That seemed promising, so even though I'm not a great chess player I played a game vs. the best model to see how it fared.

Results (in pgn format) are attached. caz.20171205.01.pgn.txt

I would have created an animated gif, like you did, but had no idea what tool to use for that.

As you can see, the opening game was passable - some sort of English Opening variant I think. But at move 15 the Chess Alpha Zero model made what seemed like a questionable move to me (I probably would have done Nxe4). From that point on the game makes very little sense, with CAZ giving away pieces left and right.

Still, very interesting.

Zeta36 commented 6 years ago

Hello, @bame55 .

I think we'll have to modify the planes of the input model following this great work: https://arxiv.org/pdf/1712.01815.pdf

Two planes seems to be not enough information. Also, I'm afraid MCTS is too slow because of Python. Not enough speed is able to be reached even with a good GPU.

bame55 commented 6 years ago

I think you're right, two planes is probably not enough. If you think about the nature of Go vs. the nature of Chess this makes intuitive sense.

As for MCTS: I don't think python is really the limiting factor since keras+tf carry most of the load, but if it is a contributing factor there's no reason portions of the python code couldn't be rewritten in C++ (or some other compiled language) and packaged up for use within python, right?

The biggest bottleneck I'm facing right now is game play speed in both evaluate.py and self_play.py. I believe this is partly because python's "chess" package does not perform well. It certainly wasn't designed for this sort of use.

Edit: The "chess" package is slow, but the game-play bottleneck is mostly model related. The only way to speed this up, I think, is to distribute the game-play load across multiple servers (as brianprichardson pointed out on another thread, with mention of LeelaZero and Fishtest).

prusswan commented 6 years ago

Related to workload distribution, is it possible for training results to be shared so that others can replicate the results?

Zeta36 commented 6 years ago

Yes, @prusswan.

You will have a copy of the best model also in your local machine all the time. You just copy your best model configuration and weights with somebody and he will be able to run it and play against it.

Regards.

Zeta36 commented 6 years ago

@bame55 this is the animated gif of your game. Could you please share with us the weights of the model you trained to loss near 1. Thank you!!

pagina3

bame55 commented 6 years ago

Sure. Where should I put the file? I tried ftping it to alpha-chess-zero.mygamesonline.org but can't write files there (ftp error is "500 Unable to service PORT commands").

Zeta36 commented 6 years ago

@bame55, push them into this project. In the folder "data/model/". Do a pull request and I'll accept it. And tell us please the model configuration that you used: cnn_filter_num, cnn_filter_size, etc.

Thank you :).

bame55 commented 6 years ago

I won't be able to do the push until tonight (roughly 12 hours from now), but will as soon as I can.

Akababa commented 6 years ago

It looks like the model might be overfitting to the opening before it has a chance to learn basic chess principles like piece values. GM games might not be the best place to start training as they don't show the network what happens when you give away pieces.

Zeta36 commented 6 years ago

Yes, @Akababa. I think you are right. Maybe it'd be better start training with low ELO games and increase in time the ELO of the games used in the SL.

Akababa commented 6 years ago

Yes, sort of like AlphaZero :) That way we can also see where it plateaus and iterate faster.

BTW can you comment on why you closed my PR? I can clean it up if necessary

Zeta36 commented 6 years ago

I closed it because I restored in master an old version of the project. I tried to introduce more planes of feed input into the model but it seems I did something wrong. Your code was commits beyond the restored version.

Akababa commented 6 years ago

Are you referring to the "data format" issue I raised? I've already fixed all the stuff I mentioned (and a few other bugs) so after a bit more testing I hope I can merge. https://github.com/Akababa/chess-alpha-zero/blob/dimreduce/src/chess_zero/env/chess_env.py It's diverged significantly by now though

Zeta36 commented 6 years ago

It looks promising, @Akababa. Anyway before pulling remember to check your development with a SL training step as we did here. Your model has to converge to a loss near 1 and play at least as well as the @bame55 one. I have a friend, @benediamond that is also trying to feed with the planes DeepMind comments in its paper. He should also do the same test with his implementation before trying self-play or anything else.

Zeta36 commented 6 years ago

By the way, about looking for games of low ELO, we can download in this web: http://ficsgames.org/download.html, PGN files with all kind of ELO games.

Akababa commented 6 years ago

Noted thanks! I hope we can share test cases too. The training loss is questionable (in my opinion) because it can just overfit the common openings, and we don't know if the model is complex enough to fit the subtle positional ideas in those openings anyway. We need a common validation test set of beginner level (~1000 elo) to compare results. Stuff like mate-in-one, and also measuring naked policy loss against the set of legal moves. Thanks for the link, I'll try to compile and PR a sensible validation set.

Zeta36 commented 6 years ago

Today I've uploaded the best model data I've been able to train in supervised learning way. I trained it using the "--type mini" configuration I pushed also today. This model was trained for 15 generations (with 8 changes in the best model) using a 5 PGN of human games with different ELO strength (not just high ELO ratings). I downloaded the PGN files from here: http://ficsgames.org/download.html (selecting "Chess (all time controls, all ratings)").

The quality of the games are like this (AI plays black):

partida4

partida5

The NN works a great game until it fails and lose the queen. The position of blacks are really good the first dozens of movements. I wonder if this simple and naive (just two planes feed input model) is able to improve even more.

Right now the model continue converging slowly and now loss it's over 5. In the next days I'll upload new results. After that I'll start training in self-play way.