Just suggestion: go to 40b or close...

CuriosAI / sai

SAI: a fork of Leela Zero with variable komi.

GNU General Public License v3.0

103 stars 11 forks source link

Just suggestion: go to 40b or close... #148

Open y-ich opened 1 year ago

y-ich commented 1 year ago

Hi.

SAI 20b have not been any rating progress since about 1 year ago. I think that going to 40b or closing the project will be appropriate. Contributors' electric power is not free for themselves...

Vandertic commented 1 year ago

You are absolutely right.

We made a bunch of experiments with different architectures as we don't find it interesting to simply increase the network size. These experiments were inconclusive and then things happened and we temporarily moved resources and time to other research items.

I will talk with the team and decide if it is time to stop the project (maybe temporarily).

Thank you for your support.

Vandertic commented 1 year ago

We decided to stop self-plays for now. A big thank you message, with some summary of the results, will be posted on the server homepage soon.

For now let me quickly thank all people that contributed with great trust in the project, in particular Takashige8 to whom goes our love and gratitude, without forgetting the other awesome main contributors: rtx2070, leibniz, akira, kamou, 13575860, shengke, snorlax, qwerg, Kabu, ionicliquids, tkg1984, peter, mbc44, sukerliu1, saveon, mag_work, nutpen85 and Naphthalin. And also a big thank to Naphthalin for the help on the math.

Will be back and when we have good news for the project.

miracond commented 1 year ago

I guess that retraining with existing data will result in a stronger 20b network. The selection of network(according to match result) makes the training data not conform to normal distribution, which will be fixed in whole data retraining.

Vandertic commented 1 year ago

Indeed this is possible, but we tried once or twice without getting a good result.

You must understand that training on existing data has to be done blindly, without match-testing every generation that the hyper-parameters (learning rate, number of training steps, training window size) were neither too low nor too high. So it may often fail until one finds the right recipes. Also, training from scratch takes about ten weeks on our hardware.

Recently we were performing several of these test trainings (but on a limited window of existing data, and with a 12b small network to go faster), changing the parameters and the network structure many times. Experiments were promising, but incomplete and they had to be interrupted because we are currently using time and resources for another scientific project.

We will be back, hopefully before winter, and hopefully with some good news.

Thank you to everybody for the interest in this project.

Deebster commented 1 year ago

@Vandertic Happy New Year (both western and lunar, now). Is there any update on this? Either way, updating the main site would be appreciated since this thread is not very visible (particularly for non-programmers).

kennyfs commented 1 year ago

@Deebster I am pessimistic about this project. KataGo has more potential than this project.