suggest to reduce match number and increase self-play number

CuriosAI / sai

SAI: a fork of Leela Zero with variable komi.

GNU General Public License v3.0

105 stars 11 forks source link

suggest to reduce match number and increase self-play number #45

Open l1t1 opened 4 years ago

l1t1 commented 4 years ago

now, for no. 4n nets, it played at least 5010+405+30*15=1150 match games, itis too much, if a half of these games could be used for training, the progress may be faster.

l1t1 commented 4 years ago

zero's ratio of self-play : match game= 18:1 sai's raio is about 5:1

trinetra75 commented 4 years ago

Hi, I'm one of the SAI developer...

This difference is due to our choice of higher promotion rate for the network tasked to play the self-play games.

The choice was due to our experience (especially on the 9x9 board) that having a large number of self-play games before a promotion is often not necessary and, instead, having a new promoted network more often is actually advantageous.

iopq commented 4 years ago

@trinetra75 no promotion matches is even more effective use of resources, just autopromote the networks

kennyfs commented 4 years ago

zero's ratio of self-play : match game= 18:1 sai's raio is about 5:1

But if you compare the games played in 24 hours, both of them are about 1:5.

Vandertic commented 4 years ago

I believe we need to have reference matches, because we need to know when the improvements are too slow, in order to decide when to scale-up parameters (visits, network structure, possibly games per generation?). Promotion matches are useful, because as you have seen often it happens that a newly trained network represents a huge decrease in performance: without those matches we would be often choosing a network that loses all games against the previous one. Moreover, we are using promotion matches to optimize on the window of training data. Recent training was done for 7 networks on the last 12 generations, for 4 networks on the last 8 and for 2 networks on the last 4. This is useful to make the selection sensible both when the strength improved abruptly and when it is almost stalled.

Vandertic commented 4 years ago

BTW, I am changing a little bit what I wrote above: now testing this configuration 7 networks every 1000 steps on 16 generations 4 networks every 1000 steps on 12 generations 4 networks every 1000 steps on 8 generations

Nazgand commented 4 years ago

Is that based on the data for the past best promotions? 'Tis nice to know how far back the training goes because I do not have Internet access 24 hours a day and have just been hoping the offline self-play games were worth generating.

l1t1 commented 4 years ago

does the self-play games number increase to 5100 since sai 97?

Vandertic commented 4 years ago

Yes, with two reasons: 1) since the drop in learning rate we increased the training steps, and hence the training takes too long, and 2) now that we are reaching higher level of play, a larger number of games is maybe useful (we train on max 16 generations, and increasing trom 3840 to 5120 should give a typical 80k games per training instead of 60k -- notice that this number is still much higher for LZ and AZ)