Open brianprichardson opened 6 years ago
Hi, @brianprichardson.
1) I'm really glad to hear that!! I know you did some modifications to the code in your fork. Please can you redo those same changes into this repo so I can merge these changes? (Included the model parameters you are using: simulations per movement, etc.)
2) I'd love too if you could please share with me the weights of your best model until now!!
3) Can you tell me also how many generations (changes in the best model) did you got already?
Thank you for your help and collaboration with this project!!
Now, how can we distribute it (at least self play and evaluation) like LeelaZero and Fishtest?
This is going to be really easy because of the way the project works. The only thing we have to do is to use some external server (in internet) from where we all to read/write the best model.
So the training pipeline will be exactly the same except the best model is not going to be loaded/saved from/to our local machines but in some server. You'll be able to keep even the progress you already got until now.
This is what I'm going to do. Let me copy your current fork into a new branch in here, and I'll develop the changes needed. Then I will tell you so we can try to train in a distribute way. I'm sure @yhyu13 will help us also with this. He's got also a good GPU.
I might have created a fork unintentionally as I am still quite new to GitHub. I am running the 11/26 99f... commit. The only changes I made were to the gpu mems listed above.
I did experiment a bit with various gpu mem limits as I could not get all three workers running at the same time. Even now with 24GB there is some swapping. So, I just created a bunch of self play games, then ran opt for a while and then eval. Once there was some data, I ran all three workers for about a day. There are 65 models in next_generation, but I think only 2 new best models.
Attaching log file which might help you. main.log
PS: I have not done my homework and studied all of the code enough yet and most of my hardware is currently running LeelaZero.
@Zeta36
It seems like you have partnered with @benediamond, that's great. I have trained 4 generations of models off and on last week. You are also doing distributed learning? It would like to know how and your plan. XD
Hi again, @yhyu13 :).
Yes, we are working together in the project. I recently added a new step in the training pipeline with a supervised learning process. This SL is based in PGN files (human chess games you can download from internet) and it'd be something like a pre-train previous to the self-play training step. DeepMind did something similar in its first version of AlphaGo in order to help the policy to not start totally random in the self-play process.
So the idea would be to train first using the "sl" worker instead of the "self" one. I mean: you run as before three workers at the same time, but instead of using:
python src/chess_zero/run.py self --type distributed
python src/chess_zero/run.py opt --type distributed
python src/chess_zero/run.py eval --type distributed
We will use until convergence:
python src/chess_zero/run.py sl --type distributed
python src/chess_zero/run.py opt --type distributed
python src/chess_zero/run.py eval --type distributed
After convergence you would stop the "sl" worker and start the "self" one (we'd need to use various PGN files with millions of movements and wait for convergence of the model).
The '--type distributed' parameter is the only thing you need to add to work in a distributed way ;).
Our hope is that pre-training the policy with SL then we will be able to generated instantaneously good data play in a self-play manner. In the game of Go this step is not necessary because of the rules of the game, but in chess the rules are more complex and it seems a totally random starting policy is unlikely to be able to generate enough quality data or that it would requiere a huge amount of time (and maybe thousands of GPUs).
Regards.
Hi @yhyu13, glad to have you on board.
I have managed to train 1 generation so far on a separate fork I have created, in which I have tried certain modifications:
'Q' --> 81
). This appears closer to general machine learning principles, though it's good that the original method is working.@Zeta36 and @yhyu13, It appears I have veered off course a bit in these experiments, unfortunately, and you are already trying different things. I am very excited about joining the distributed effort, but first I would like to learn a bit more whether the above methods can succeed. Once I have satisfied my curiosity, I will contribute my machine (NVIDIA GeForce GTX 1080) to the cause. Please let me know if you'd like to learn more about what I've tried.
@benediamond did you get convergence in the optimization worker? I tried with your one-hot version but the model seemed not to converge (at least in your version I tried some days ago).
@Zeta36 reaching a model checkpoint appears to take about 20 minutes. from total scratch I have losses
loss: 7.8623 - policy_out_loss: 7.1453 - value_out_loss: 0.3209
.
by the time of the checkpoint, i have
loss: 0.4104 - policy_out_loss: 0.0029 - value_out_loss: 0.0029
.
how does this compare to your experience?
it's strange that this doesn't work, but I will be happy to switch back if using ord()
is better.
@benediamond did you reach a loss so near 0?? but, are you generating constantly new play data with the self-play worker or do you stop the self-play and then you start the optimization? It sounds a lot as a over-fitting issue.
Moreover, are you constantly evaluating the best model using the eval worker? How many times was best model changed by this evaluator worker?
I am generating new self-play data, but my optimizer works much faster. So I generate many models on the same batch of self-play data.
Yes, it sounds like overfitting. Do you think I should reduce the learning rate? How did your optimizer perform?
@benediamond, my optimizer performs much more slowly but as you know I have no GPU only CPU.
@benediamond, have you try to play against your best model? I mean you playing with the play_gui option. If your model is not overfitting then the model should work more or less fine.
@Zeta36 you mean, I myself play against it? I haven't tried, but I can. But I don't expect it to be good. My evaluator has replaced the best model only once so far, but I only began training last night (I have been making various changes...)
It's really very strange that the evaluator only changed the best model once, @benediamond. It is not possible you to reach a loss near 0 and that your next generation model cannot win the random one (first best model is a random one). You've got to have some kind of bug over there.
@Zeta36 Yes. I think the more parameters, the more chance of overfitting, and I didn't realize this when I implemented the one-hot feature. I have now reduced the learning rate tenfold.
On the other hand, I only began training this present version last night. And it looks like I will soon get a 2nd model... I will keep testing.
I am also experimenting with another feature:
I've also done something rather pointless but perhaps useful:
In any case, @Zeta36, I am interested in exploring these features. I will soon implement / copy in your supervised learning feature. If the results look very strong, then perhaps with your permission I may push my fork into a sub-branch of your repository. Please let me know.
Of course you can :).
Let's keep working.
What do you mean with "convergence of the model"?....just to know when I have to switch from "sl" to "self"
hi friends, i'm not a good programmer. I invented chess variants and i'm looking for someone to help me create a program based on This project to play my chess variant named Musketeer Chess. The main objective is finally to see the progress of this self learning program and the second objective is if it's good enough in playing to try to have an evaluation of the fairy chess pieces compared to the classic chess pieces. This is a job offer.
@Zeta36 I'm trying to understand this project as AlphaZero is very interesting to me. Is the following true as a high level overview of the workers?
self_play.py is playing matches between the same model and saving them to a file location. Then optimize.py is loading the matches, converting the moves to a suitable format, and training the next_gen model. Next evaluate.py is playing matches between the best_model and the next_gen model. If the next_gen wins at X rate it replaces best_model.
Thanks, coughlnj
Progress is slow, but it does seem to be working!
Eval games take about 6 min. (set gpu mem to 25%) Opt epochs take 29 sec. (gpu mem 30%) Self play games highly variable. (gpu mem 25%) Total gpu util about 93% (mem 90+% with Firefix and Nvidia and System monitors running) Ubuntu 16.04LTS 4GHz i920 24GB 1070GTX 8GB
Thanks for sharing.
Now, how can we distribute it (at least self play and evaluation) like LeelaZero and Fishtest?