CuriosAI / sai

SAI: a fork of Leela Zero with variable komi.
GNU General Public License v3.0
104 stars 11 forks source link

Document network size change when the progression stale #49

Open tychota opened 4 years ago

tychota commented 4 years ago

Recently the elo rating is starting to stale.

image

While this is super early to say if it really staled (over the last 5 promotion is statically no enough), it is hard as an external reader / participant to answer this two questions:

Inded the SAI pipeline (https://github.com/sai-dev/sai/wiki/Progress#sai-pipeline) does not document the network change process.

I recall having seen that not only 6x128 is trained but also bigger networks. Is it true ? can't recall where I found that :) (Edit: found here https://github.com/sai-dev/sai/issues/15 but this method of training new net in the same time and not using net2net like leela zero is an important divergence to leela and could be documented in the wiki)

I will expect something like:

The cycle is as follows.
1. `gen=0`, `current_net=random`, `n=1`;
2. `current_net` plays **2560** whole self-play games, with variable komi, distributed according to `current_net` [[evaluation|Evaluation of fair komi]];
3. `current_net` starts playing **[[branches|Branching games]]** of self-play games, from random positions of previous games;
4. when the game count reaches **3072** self-play games, **training starts**, based on the self-plays games of the last `n` generations;
+ The following network are trained:
+ - 6x128
+ - 10x128
+ - 15 x192
- 5. during training, a variable number of **candidate networks** are generated (currently, 10 networks at 2000 training steps one from the other); 
+ 5. during training, a variable number of **candidate networks** are generated:
+ - if the rate "elo/plays" of the linear interpolation of the last then promotion is larger than 10 elo gained per 40000 games,  10 candidate networks of the current size (6x128) at 2000 training steps one from the others are generated
+ - if the rate "elo/plays" of the linear interpolation of the last then promotion is less than 10 elo gained per 40000 games,  10 candidate networks of the current size (6x128) at 2000 training steps one from the others and 10 candidate networks of the size just above (10x128) at 2000 training steps one from the others  are generated
6. as soon as candidates are available, **promotion matches** are added between the new candidate networks and `current_net`. These matches can be identified because they are **50** games long;
7. when promotion matches end, the best candidate network is identified; denote it by `chosen_net`;
8. `current_net` finishes playing branches of self-play games until count reaches **3840**;
9. **reference matches** are added between several recent networks (the ones promoted at generations `gen-k`, with `k` in `{1, 2, 5, 8, 11}`) and `chosen_net`, to get a more precise evaluation of `chosen_net` Elo. These matches can be identified because they are **40** games long;
10. if `gen` is a multiple of 4, **panel matches** are added between the 16 networks in the [[panel|Panel of reference networks]] and `chosen_net`, again to get an even more precise evaluation of `chosen_net` Elo. These matches can be identified because they are **30** games long;
11. `gen++`, `current_net=chosen_net`, if [[reasonable|Generations for training]] then `n++`;
12. go to step 2;
Vandertic commented 4 years ago

This question is really important. Presently we are having hardware problems with the second workstation doing the training, so we are behind with the parallel training of B and C structures. Anyway I hope to reach the recent generations soon.

When to switch network? We know when LZ switched (after LZ091 at 8628 Elo, we are currently below LZ058 at 6840 Elo), and we should have another 2000 Elo before a change is really needed. We will reasonably switch well before that, but before switching, we should think of other things to change/improve.

In the first generations, it was more than enough to train on 4 generations, or 16k games. But AGZ trained on 500k games and LZ and AZ I am not sure, but the number 250k comes to mind. Now we are increasing to 16 generations, for 64k games. Is this enough to learn the subtle features we need to improve on? We can increase this number only if we are stalled or almost stalled, as otherwise there would be too much difference between level and style of play of so many different generations.

Maybe we should increase the number of games per generation, but I really think that it is equivalent and maybe better, to keep it around 4k and simply increase the number of generations for training, when stalled.

All in all maybe the problem is that, since the beginning, we improved so much faster than LZ, that we are underestimating the amount of games needed to do the next step. We are looking to improvements every 4k games and we cannot find any, how strange!

Finally, I don't want to shun your question. It is relevant and we must keep watch and be ready to switch network when the time comes. But please keep also an eye for something else we might want to improve or increase. Every opinion and comment is precious and listened to.

Thank you.

tychota commented 4 years ago

Thanks for the detailed answer. It is really a good explanation.

l1t1 commented 4 years ago

lz also stale, http://www.yss-aya.com/cgos/19x19/standings.html says

Game Program Name Rating Games Played Last Game
LZ_247_901e_p400 3198 758 2019-10-19 08:34:05
LZ_249_6ee2_p400 3170 1241 2019-11-11 18:49:29
LZ_248_e76d_p400 3155 200 2019-10-22 14:21:25
620949 LZ_250_3d41_p400 3145 138 2019-11-13 23:53:44

now nobody to increase the net size

barrtgt commented 4 years ago

Here are the combined policies of corner moves from some nets with an empty board:

policy_evolution

At least in this example, the policy is changing steadily. Would AZ's initializing to loss strategy fare any better?

Vandertic commented 4 years ago

@barrtgt I did not understand your last question. What do you mean by "initializing to loss strategy"?

barrtgt commented 4 years ago

Sorry, I'm not very sure about it, but Matthew Lai, one of the AZ devs, goes into more details about it and other things AZ in this forum: http://talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=70&sid=8eb37b9c943011e51c0c3a88b427b745

It was also used in some of the later runs in Minigo with success. Perhaps @amj could chime in?

Vandertic commented 4 years ago

I get a broken link apparently

barrtgt commented 4 years ago

Really? Works for me. Here is a screenshot with one of his posts: matt_lai_1

Vandertic commented 4 years ago

Oh, I see what you mean! That is fpu (first play urgency) setting. We are following LZ here, with fpu initialized to something less than the best winrate around. You can try AZ approach by using the command line option --fpu_zero (IIRC). It did not work very well for our 7x7 runs, so we are not currently using it for self-plays.