glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
760 stars 301 forks source link

Arena fight to find next net to promote #695

Open DaghN opened 6 years ago

DaghN commented 6 years ago

This idea is based on a test Aloril made, where he trained 5 new nets on the same games, but in random feed order. Then he tested the nets in his sheet, and it turned out they seemingly varied quite a bit in strength.

If this is true, the conclusion is that, even with exactly the same games, the strength of a trained net will be somewhat... random.

Therefore, I present the Arena mode for promotion:

1) Train about 4 new nets from a cycle of games, with the games being fed in different/random orders (the more nets you train, the better chance to train one close to the maximum possible, but... it will take more games in total to identify the net, this is why I suggest around 4 nets, since like with a dice, if you throw a dice 4 times, you should get "close" to six most times).

2) Let them fight it out in the Arena. Let them play 1000 or 2000 games against one another (6000 or 12000 games in total), and we can identify either the strongest one, or one that is at least almost as strong as the strongest.

3) Use the Arena winner for the next training cycle.

If I am not mistaken somewhere in the initial assumption, this will give us an almost guaranteed Elo increase, and a significant one to boot, every single cycle.

Some simple toy model math:

1) We assume that a trained net is between -9 and 11 in Elo, in whole number steps.

2) The current scheme, with no gating or Arena, would give us about 1 Elo pr. cycle.

2) Gating would give us an average close to 3 Elo pr. cycle. (Half of the nets would be promoted, and their average would be about 6).

3) Arena would give us on average something like 6-8 Elo pr. cycle. (You can run a simple simulation, but it is not important, it's just a toy model).

TLDR; Arena is a super efficient way to squeeze the most Elo out of our games, and games are the bottleneck for us, so we should implement this Arena idea.

MarkBennet commented 6 years ago

This is an interesting idea but assumes there are no local maxima. It might be interesting if all results are negative to take the most negative. But this case would need some thought/testing and it is unclear to me what the best strategy would be.

DaghN commented 6 years ago

But on the other hand, who says we need to find the global maximum (which is extremely hard anyway)?

I suspect all the local maxima will be quite close in strength, since they may well essentially represent different orders to accumulate the same chess knowledge.

Also, who says we are anywhere close to the peaks? Most of the time will/should be spent at the red spot ascending here:

image

jjoshua2 commented 6 years ago

This is similar to how leela go works, except they release nets at say 2k 4k, 8k 16k steps that are all similar in theory, but instead of testing them in an arena, they test them as they come out to be better. So sometimes say the 2k will fail and 4k will pass, and the 8k would have passed, but it's not enough better than the 4k steps to know it's better so it fails, and in a few hours after thousands of games cycle repeats...