Open tychota opened 4 years ago
This question is really important. Presently we are having hardware problems with the second workstation doing the training, so we are behind with the parallel training of B and C structures. Anyway I hope to reach the recent generations soon.
When to switch network? We know when LZ switched (after LZ091 at 8628 Elo, we are currently below LZ058 at 6840 Elo), and we should have another 2000 Elo before a change is really needed. We will reasonably switch well before that, but before switching, we should think of other things to change/improve.
In the first generations, it was more than enough to train on 4 generations, or 16k games. But AGZ trained on 500k games and LZ and AZ I am not sure, but the number 250k comes to mind. Now we are increasing to 16 generations, for 64k games. Is this enough to learn the subtle features we need to improve on? We can increase this number only if we are stalled or almost stalled, as otherwise there would be too much difference between level and style of play of so many different generations.
Maybe we should increase the number of games per generation, but I really think that it is equivalent and maybe better, to keep it around 4k and simply increase the number of generations for training, when stalled.
All in all maybe the problem is that, since the beginning, we improved so much faster than LZ, that we are underestimating the amount of games needed to do the next step. We are looking to improvements every 4k games and we cannot find any, how strange!
Finally, I don't want to shun your question. It is relevant and we must keep watch and be ready to switch network when the time comes. But please keep also an eye for something else we might want to improve or increase. Every opinion and comment is precious and listened to.
Thank you.
Thanks for the detailed answer. It is really a good explanation.
lz also stale, http://www.yss-aya.com/cgos/19x19/standings.html says
Game | Program Name | Rating | Games Played | Last Game |
---|---|---|---|---|
— | LZ_247_901e_p400 | 3198 | 758 | 2019-10-19 08:34:05 |
— | LZ_249_6ee2_p400 | 3170 | 1241 | 2019-11-11 18:49:29 |
— | LZ_248_e76d_p400 | 3155 | 200 | 2019-10-22 14:21:25 |
620949 | LZ_250_3d41_p400 | 3145 | 138 | 2019-11-13 23:53:44 |
now nobody to increase the net size
Here are the combined policies of corner moves from some nets with an empty board:
At least in this example, the policy is changing steadily. Would AZ's initializing to loss strategy fare any better?
@barrtgt I did not understand your last question. What do you mean by "initializing to loss strategy"?
Sorry, I'm not very sure about it, but Matthew Lai, one of the AZ devs, goes into more details about it and other things AZ in this forum: http://talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=70&sid=8eb37b9c943011e51c0c3a88b427b745
It was also used in some of the later runs in Minigo with success. Perhaps @amj could chime in?
I get a broken link apparently
Really? Works for me. Here is a screenshot with one of his posts:
Oh, I see what you mean! That is fpu (first play urgency) setting. We are following LZ here, with fpu initialized to something less than the best winrate around. You can try AZ approach by using the command line option --fpu_zero (IIRC). It did not work very well for our 7x7 runs, so we are not currently using it for self-plays.
Recently the elo rating is starting to stale.
While this is super early to say if it really staled (over the last 5 promotion is statically no enough), it is hard as an external reader / participant to answer this two questions:
Inded the SAI pipeline (https://github.com/sai-dev/sai/wiki/Progress#sai-pipeline) does not document the network change process.
I recall having seen that not only 6x128 is trained but also bigger networks. Is it true ?
can't recall where I found that :)(Edit: found here https://github.com/sai-dev/sai/issues/15 but this method of training new net in the same time and not using net2net like leela zero is an important divergence to leela and could be documented in the wiki)I will expect something like: