glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
758 stars 298 forks source link

Different elo targets #109

Closed jjoshua2 closed 6 years ago

jjoshua2 commented 6 years ago

The bottom elo on ccrl 40/4 is only 276 elo and sometimes loses to a random mover, but it requires java. A good first target might be chessputer open source UCI cpp at 765 elo.

I don't know an elo, but alan turing's historic chess program has been implemented in chessbase engine UCI (download), and played against Kasprov (he beat in 16 moves). Would be good publicity, and it can be set to different ply depths.

Robocide, open source C UCI engine, 1897 elo

Ruffian 2.1.0 rated 2609. Was the best free engine I used to use a long time ago.

Crafty, famous, elo 2400-3000 depending on version.

Scorpio 2.7.9 was the weakest engine in the bottom TCEC 4th league around 2900 elo.

Gull 3, a strong open source program, now mid-level TCEC 1st league around 3200 elo.

Andsacs .93 open source mid-level TCEC Premier league 3300 elo with 4 CPU.

Komodo 9, winner of TCEC 8, now free, 3383 4 CPU.

Stockfish 9 top released engine, open source, 3560 elo with 4 CPU.

jjoshua2 commented 6 years ago

I tried to load lczero as UCI engine to match some of these and couldn't get it to work, even when I hardcoded the -w weights file, since GUIs don't let you send params. Maybe it takes too long to initialize or would work in other gui?

glinscott commented 6 years ago

Which GUI are you using? The cutechess gui should let you pass command line parameters.

jjoshua2 commented 6 years ago

I was using infinity chess gui. I can try cutechess I suppose. Chessbase also don't support command line. I think the arguments really should be uci parameters. Especially threads.

On Mar 12, 2018 11:47 PM, "Gary Linscott" notifications@github.com wrote:

Which GUI are you using? The cutechess gui should let you pass command line parameters.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/109#issuecomment-372537508, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INOvEzFdSqVs62Kb8foTJUFesckZjks5td0E0gaJpZM4SnVT3 .

CMCanavessi commented 6 years ago

009 - Qualy League.txt

009 - Qualy League Ratings.txt 009 - Qualy League Ratings Head2head.txt

I will test Gen 5 in a couple of days.

glinscott commented 6 years ago

Once it starts getting stronger, agreed, UCI parameters make sense. Or if someone else sends a PR I'd be happy to merge it :).

CMCanavessi commented 6 years ago

I would be happy with at least some Time Control management. I wanted to make some 30''+0.5'' tournaments but it was constantly losing on time, as it takes around 1.7 seconds to move in my box, using default 800 playouts.

jkiliani commented 6 years ago

I did a number of round-robin tournaments with Stockfish at different Skill levels and constant time control 1 sec / 40 moves, as well as with (slightly) longer time controls, to get an idea on its scaling. Here are my results:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 sf                            861     136    1000   99.3%    0.6%
   2 sf10                          175      24    1000   73.2%    2.6%
   3 sf8                           114      22    1000   65.8%    2.4%
   4 sf5                           -90      22    1000   37.4%    1.9%
   5 sf3                          -258      27    1000   18.4%    1.5%
   6 sf1                          -481      46    1000    5.9%    0.4%

sf meaning Stockfish without Skill level setting, all engines 40/1

Rank Name                          Elo     +/-   Games   Score   Draws
   1 sf20                          531     114     200   95.5%    4.0%
   2 sf                            449      93     200   93.0%    4.0%
   3 sf17                          173      51     200   73.0%   10.0%
   4 sf16                          160      52     200   71.5%    5.0%
   5 sf19                          149      51     200   70.3%    7.5%
   6 sf18                          149      51     200   70.3%    7.5%
   7 sf15                          106      49     200   64.8%    7.5%
   8 sf13                           89      47     200   62.5%   10.0%
   9 sf14                           70      47     200   60.0%    9.0%
  10 sf12                           63      45     200   59.0%   14.0%
  11 sf11                           47      46     200   56.8%   10.5%
  12 sf9                            42      46     200   56.0%   11.0%
  13 sf10                           37      47     200   55.3%    8.5%
  14 sf8                           -12      46     200   48.3%    8.5%
  15 sf7                          -133      51     200   31.8%    3.5%
  16 sf6                          -166      53     200   27.8%    4.5%
  17 sf5                          -246      62     200   19.5%    1.0%
  18 sf4                          -279      65     200   16.8%    1.5%
  19 sf3                          -382      83     200   10.0%    1.0%
  20 sf2                          -470     109     200    6.3%    0.5%
  21 sf1                          -676     363     200    2.0%    0.0%

Dito, it seems Skill level=20 is equivalent to not setting a skill level?

Rank Name                          Elo     +/-   Games   Score   Draws
   1 sf9                           492      70     450   94.4%    1.8%
   2 sf8                           287      43     450   83.9%    3.8%
   3 sf7                           226      39     450   78.6%    2.4%
   4 sf6                           112      33     450   65.6%    4.0%
   5 sf5                            39      32     450   55.6%    3.1%
   6 sf4                           -31      32     450   45.6%    1.8%
   7 sf3                          -140      34     450   30.9%    2.7%
   8 sf2                          -241      40     450   20.0%    2.2%
   9 sf1                          -303      45     450   14.9%    1.3%
  10 sf0                          -369      52     450   10.7%    0.9%

I wasn't aware that 0 was a valid setting for Skill level until then, but tried it at this point. All games 1 sec / 40 moves.

Rank Name                          Elo     +/-   Games   Score   Draws
   1 sf_40/16                      350      53     200   88.3%   21.5%
   2 sf_40/8                       184      43     200   74.3%   30.5%
   3 sf_40/4                        12      42     200   51.7%   25.5%
   4 sf_40/2                      -151      44     200   29.5%   25.0%
   5 sf_40/1                      -470      84     200    6.3%    8.5%

And finally, a time scaling test that revealed very considerable scaling at such short time controls.

I just measured @Error323's supervised net kbb1-64x6-796000.txt against sf5, i.e.

./cutechess-cli -rounds 70 -tournament gauntlet -concurrency 2 -pgnout SF0.pgn \
 -engine name=lc_kbb1 cmd=lczero arg="--threads=1" arg="--weights=$WDR/kbb1-64x6-796000.txt" arg="--playouts=800" arg="--noponder" arg="--noise" tc=inf \
 -engine name=sf_lv5 cmd=stockfish_x86-64 option.Threads=1 option."Skill Level"=5 tc=40/1 \
 -each proto=uci

The result:

Score of lc_kbb1 vs sf_lv5: 39 - 29 - 2  [0.571] 70
Elo difference: 49.98 +/- 82.52

From the previous tests, sf5 should be roughly 450 Elo above sf0. So far, I had none of the reinforcement learning nets scoring any wins or draws against sf0 yet, but I'm going to run another match tonight with gen6 to test whether that changes today. I'll update when I have something.

glinscott commented 6 years ago

@jkiliani very cool, thanks for the numbers. At some point, we can do a round-robin tournament and get a better idea of the overall ELO progression as well.

jjoshua2 commented 6 years ago

I think @CMCanavessi tournament is more interesting using engines of similar strength. Crippling a strong engine doesn't make as much sense to me, but I am always excited to see both, so keep them coming! SF skill level increasing is easier and more efficient to test than against many engines though.

I can't wait to see how gen6 does. I think it will be about 600 elo. I estimated gen4 was 530 between Acqua and NEG on CCRL 40/4. Although seeing 412 with a 0 elo random mover was good too. CCRL base is Brutus RND at 200 elo.

CMCanavessi commented 6 years ago

Running a tournament right now with 24 engines under ~1000 elo, Leela Chess Zero Gen 6 is playing. Will update later with how it's doing. Should do somewhat better than Gen 4, but can it beat NEG and Acqua consistently now? Can it at least draw with Easy Peasy? We'll see...

jjoshua2 commented 6 years ago

@CMCanavessi you can reuse most of the engine v engine games from prior tournaments and just rerun LCZ right? is that the strategy you are doing?

CMCanavessi commented 6 years ago

Yes, you can do that but I'm just running a completely new tournament, with more rivals. Looking good so far.

In the previous tournament, LCZ Gen 4 played 10 games vs. NEG and got 0 wins, 1 draw and 9 loses. In this tournament it has already played once vs NEG and it won. And seeing it play, it looks to have much better endgame undesrtanding. We'll see how it looks when more games are played, so far it's at 50%.

jjoshua2 commented 6 years ago

Did some research and it looks like NEG and Acqua both just do a 1 ply search with no lookahead, so it should be easy for a neural network + any search to beat them once the net understands the very basics. It may be interesting if you could put an SF with skill=0 in your tournament. I think it might be a 1 ply plus quiescent search? I think I saw skill=1 was 3 ply.

Edit: it appears skill level has maxDepth of level + 1, but also randomly picks among the top 4 moves as long as its not a major blunder.

CMCanavessi commented 6 years ago

1st round robin out of 10 has been played and Gen 6 is showing better performance compared to Gen 4. Here are the current standings:

    Engine                         Score  BRXaUsIoTaHaMFNSYoZoPyDiPyAcN.LeEaRaLaCPPOEtEtTe    S-B
01: BRAMA 05/12/2004 x32           20.5/23 · 0 1 1 0 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  209,25
02: Xadreco 5.83 x32               19.5/23 1 · 0 = 1 = 1 = = 1 1 = 1 1 1 1 1 1 1 1 1 1 1 1  198,50
03: Usurpator II x32               19.0/23 0 1 · 1 1 = 0 1 1 0 1 = 1 1 1 1 1 1 1 1 1 1 1 1  190,75
04: Iota 1.0 x32                   18.5/23 0 = 0 · = 1 = 1 1 1 1 1 1 1 1 1 1 = 1 1 1 = 1 1  185,50
05: Talvmenni 0.1 x32              18.0/23 1 0 0 = · = = = 1 = 1 1 1 1 = 1 1 1 1 1 1 1 1 1  173,75
06: Hanzo the Razor x32            17.5/23 = = = 0 = · = 1 1 = = = 1 1 = 1 1 1 1 1 1 1 1 1  168,50
07: MFChess 1.3 x32                17.0/23 0 0 1 = = = · = = = 1 1 1 1 1 1 = = 1 1 1 1 1 1  162,00
08: NSVChess 0.14 x32              15.0/23 0 = 0 0 = 0 = · = 1 = = = 1 1 1 1 1 1 1 = 1 1 1  130,25
09: Youk V1.05 x32                 15.0/23 0 = 0 0 0 0 = = · = = = 1 1 1 1 1 1 1 1 1 1 1 1  122,00
10: Zoe 0.1 x32                    14.0/23 0 0 1 0 = = = 0 = · 0 = = 0 1 1 1 1 1 1 1 1 1 1  119,50
11: Pyotr Amateur Edition v0.6 x32 13.5/23 0 0 0 0 0 = 0 = = 1 · = = 1 0 1 1 1 1 1 1 1 1 1  105,00
12: Dikabi v0.4209 x32             12.0/23 0 = = 0 0 = 0 = = = = · = = 1 = 1 = = 1 = 1 1 =  109,00
13: Pyotr Novice Edition v2.6 x32  11.5/23 0 0 0 0 0 0 0 = 0 = = = · 1 1 0 1 1 = 1 1 1 1 1   80,75
14: Acqua ver. 20160918 x32        10.0/23 0 0 0 0 0 0 0 0 0 1 0 = 0 · 0 1 1 1 = 1 1 1 1 1   63,00
15: N.E.G. 1.2 x32                 9.5/23  0 0 0 0 = = 0 0 0 0 1 0 0 1 · 0 = = 1 1 = 1 1 1   67,75
16: Leela Chess Zero Gen 6 x64     9.0/23  0 0 0 0 0 0 0 0 0 0 0 = 1 0 1 · 0 1 = 1 1 1 1 1   52,50
17: Easy Peasy 1.0 x32             8.5/23  0 0 0 0 0 0 = 0 0 0 0 0 0 0 = 1 · = 1 1 1 1 1 1   46,50
18: Ram 2.0 x32                    7.5/23  0 0 0 = 0 0 = 0 0 0 0 = 0 0 = 0 = · = = 1 1 1 1   48,25
19: LaMoSca v0.10 x32              5.0/23  0 0 0 0 0 0 0 0 0 0 0 = = = 0 = 0 = · = = = = =   32,75
20: CPP1 0.1038 x32                5.0/23  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = = · 1 1 1 1   16,75
21: POS v1.20 x32                  4.0/23  0 0 0 0 0 0 0 = 0 0 0 = 0 0 = 0 0 0 = 0 · = 1 =   25,00
22: EtherTrueRand 9.21 x64         3.0/23  0 0 0 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 0 = · = 1   16,25
23: EtherealRandom (8.97) x64      2.0/23  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 0 0 = · 1    5,50
24: Teki Random Mover x64          1.5/23  0 0 0 0 0 0 0 0 0 0 0 = 0 0 0 0 0 0 = 0 = 0 0 ·   10,50

276 of 2760 games played

And here's how the bottom of my rating list looks like right now:

 156 MFChess 1.3 x32                        :  1001.0      23   13    8    2    74    35   661.7    23    23.0
 157 Hippocampe v0.4.2 x32                  :   982.0     150   98   18   34    71    12   652.1    15    15.0
 158 Youk V1.05 x32                         :   958.4      62   16    7   39    31    11  1204.8    45    42.0
 159 NSVChess 0.14 x32                      :   860.7     173   92   39   42    64    23   661.2    29    19.7
 160 Zoe 0.1 x32                            :   833.9      23   11    6    6    61    26   669.0    23    23.0
 161 Pyotr Amateur Edition v0.6 x32         :   814.4      23   11    5    7    59    22   669.8    23    23.0
 162 Easy Peasy 1.0 x32                     :   756.6     173   92   16   65    58     9   667.8    29    19.7
 163 Dikabi v0.4209 x32                     :   737.3      23    5   14    4    52    61   673.2    23    23.0
 164 Pyotr Novice Edition v2.6 x32          :   708.6      23    9    5    9    50    22   674.4    23    23.0
 165 N.E.G. 1.2 x32                         :   574.4     173   70   17   86    45    10   679.4    29    19.7
 166 Acqua ver. 20160918 x32                :   569.7     173   72   12   89    45     7   679.7    29    19.7
 167 Leela Chess Zero Gen 6 x64             :   568.9      23    8    2   13    39     9   680.5    23    23.0
 168 Leela Chess Zero Gen 4 x64             :   413.8     150   43   18   89    35    12   690.0    15    15.0
 169 Ram 2.0 x32                            :   413.6     173   44   29  100    34    17   689.6    29    19.7
 170 CPP1 0.1038 x32                        :   361.6     173   35   34  104    30    20   692.9    29    19.7
 171 LaMoSca v0.10 x32                      :   282.3     173    1   83   89    25    48   698.0    29    19.7
 172 POS v1.20 x32                          :   165.7     173   13   33  127    17    19   705.4    29    19.7
 173 EtherTrueRand 9.21 x64                 :    50.6     173    2   33  138    11    19   712.7    29    19.7
 174 EtherealRandom (8.97) x64              :    35.5      23    1    2   20     9     9   703.7    23    23.0
 175 Teki Random Mover x64                  :     0.0     173    0   29  144     8    17   715.9    29    19.7

Too early to quantify the gain, but Gen6 is clearly stronger than Gen4. We'll see tomorrow when a couple more rounds are played.

CMCanavessi commented 6 years ago

Leela Gen 6 has played 55 games now in the new tournament, and things look much better than Gen 4. here's the updated ratings from the bottom of my rating list:

 152 Usurpator II x32                       :  1019.6      55   40    5   10    77     9   653.7    23    22.5
 153 Talvmenni 0.1 x32                      :   998.7      55   34   16    5    76    29   649.2    23    22.5
 154 StrategicDeep 1.25 x32                 :   989.6      39    3    2   34    10     5  1501.8    23    22.1
 155 Hanzo the Razor x32                    :   981.9      55   30   24    1    76    44   626.8    23    22.5
 156 MFChess 1.3 x32                        :   954.1      55   31   17    7    72    31   653.0    23    22.5
 157 Hippocampe v0.4.2 x32                  :   933.4     150   98   18   34    71    12   618.0    15    15.0
 158 Youk V1.05 x32                         :   918.2      94   38   10   46    46    11   975.0    45    42.8
 159 Zoe 0.1 x32                            :   818.4      55   28   14   13    64    25   628.8    23    22.5
 160 NSVChess 0.14 x32                      :   800.5     205  103   54   48    63    26   626.3    29    22.7
 161 Pyotr Amateur Edition v0.6 x32         :   787.7      55   26   16   13    62    29   616.2    23    22.5
 162 Dikabi v0.4209 x32                     :   740.6      55   14   34    7    56    62   633.3    23    22.5
 163 Easy Peasy 1.0 x32                     :   683.5     205  102   22   81    55    11   636.9    29    23.1
 164 Pyotr Novice Edition v2.6 x32          :   613.6      55   19   11   25    45    20   654.0    23    22.5
 165 Leela Chess Zero Gen 6 x64             :   587.8      55   18   12   25    44    22   638.3    23    22.5
 166 N.E.G. 1.2 x32                         :   532.5     205   77   24  104    43    12   652.7    29    23.6
 167 Acqua ver. 20160918 x32                :   527.7     205   82   15  108    44     7   646.2    29    23.1
 168 Ram 2.0 x32                            :   391.8     205   50   38  117    34    19   650.8    29    22.5
 169 Leela Chess Zero Gen 4 x64             :   383.9     150   43   18   89    35    12   654.6    15    15.0
 170 CPP1 0.1038 x32                        :   331.7     205   39   43  123    30    21   651.9    29    22.9
 171 LaMoSca v0.10 x32                      :   271.2     205    2   99  104    25    48   658.1    29    22.7
 172 POS v1.20 x32                          :   153.2     205   15   39  151    17    19   674.6    29    23.4
 173 EtherealRandom (8.97) x64              :    65.7      55    2    8   45    11    15   656.3    23    22.5
 174 EtherTrueRand 9.21 x64                 :    40.1     205    2   40  163    11    20   677.8    29    23.2
 175 Teki Random Mover x64                  :     0.0     205    0   36  169     9    18   675.2    29    22.7

Now we can start to see those +200 elo stronger showing. After this tournament is finished (still many games to go) I'll probably wait for Gen 10 or something like that to make a new test.

jjoshua2 commented 6 years ago

Thanks! I would appreciate even if you could just run a gauntlet of Gen 7 against Easy Peasy Pyotr and Neg. There is debate about how inflated the +200 elo in self play is with it not seeming to gain much yet against SF level 0.

Error323 commented 6 years ago

Gen 8 :)

CMCanavessi commented 6 years ago

Here are the final standings of the tournament with Gen 6

    Engine                         Score     BR   Io   Xa   Ta   Us   Ha   MF   Yo   Zo   Py   NS   Di   Py   Ea   Le   N.   Ac   Ra   CP   La   PO   Et   Et   Te    S-B
01: BRAMA 05/12/2004 x32           81.5/92 ···· 1==1 01=1 0==0 111= ==1= 1==1 1111 1111 11=1 111= 1=1= 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111  3370,2
02: Iota 1.0 x32                   76.0/92 0==0 ···· =101 ===0 0=01 1=== =1=1 1111 1=1= 1=11 1111 1111 1111 1111 11=1 1111 1111 =1=1 1111 1111 1111 =11= 1111 1111  3075,0
03: Xadreco 5.83 x32               75.5/92 10=0 =010 ···· 11=0 0101 ==0= 1111 =1=0 1111 1011 =11= ==1= 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111  2981,5
04: Talvmenni 0.1 x32              73.5/92 1==1 ===1 00=1 ···· 001= ==== ==0= 111= ==1= 1111 ===1 1=1= 1111 111= 1111 =111 1111 11=1 1111 1111 11=1 1111 1111 1111  2946,7
05: Usurpator II x32               71.5/92 000= 1=10 1010 110= ···· =0=0 0001 1110 0101 1110 1=11 ===1 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111  2725,5
06: Hanzo the Razor x32            70.0/92 ==0= 0=== ==1= ==== =1=1 ···· ===1 111= ==== =1== 1=1= ==== 1111 11== 1==1 =111 1111 11=1 1111 1111 1111 11=1 1111 1=11  2807,0
07: MFChess 1.3 x32                66.0/92 0==0 =0=0 0000 ==1= 1110 ===0 ···· =0== =1== 1=11 ==== 10== 11=1 =111 1=11 1111 1111 =1=1 1111 1111 1111 1111 1111 1111  2448,5
08: Youk V1.05 x32                 64.5/92 0000 0000 =0=1 000= 0001 000= =1== ···· =11= ===1 =11= ==1= 111= 11=1 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111  2251,7
09: Zoe 0.1 x32                    58.0/92 0000 0=0= 0000 ==0= 1010 ==== =0== =00= ···· 0110 0=== =01= =1== 11== 1=11 1111 0111 11=1 1111 1111 1111 1111 1111 1111  1986,2
10: Pyotr Amateur Edition v0.6 x32 54.5/92 00=0 0=00 0100 0000 0001 =0== 0=00 ===0 1001 ···· ==== ==== =1=0 11=1 1=11 01=1 1111 1=11 1111 111= 1111 1111 1111 1=11  1813,0
11: NSVChess 0.14 x32              52.0/92 000= 0000 =00= ===0 0=00 0=0= ==== =00= 1=== ==== ···· ==0= ===0 1=10 1==0 1111 1==1 1111 1=11 1==1 =11= 1111 1111 1111  1767,7
12: Dikabi v0.4209 x32             50.0/92 0=0= 0000 ==0= 0=0= ===0 ==== 01== ==0= =10= ==== ==1= ···· =111 1=11 ==== 11== =1=1 ==== 1=== =111 ==1= 1=== 1=== ===1  1981,7
13: Pyotr Novice Edition v2.6 x32  45.0/92 0000 0000 0000 0000 0000 0000 00=0 000= =0== =0=1 ===1 =000 ···· 11=1 0==1 1==1 1110 1=11 111= ==11 1111 1111 1111 1111  1264,7
14: Easy Peasy 1.0 x32             40.5/92 0000 0000 0000 000= 0000 00== =000 00=0 00== 00=0 0=01 0=00 00=0 ···· 1101 =1=1 0011 =111 1111 1=== 1111 1111 1111 1111  1096,2
15: Leela Chess Zero Gen 6 x64     40.0/92 0000 00=0 0000 0000 0000 0==0 0=00 0000 0=00 0=00 0==1 ==== 1==0 0010 ···· 10== 0011 1111 1110 ==11 1111 1111 1111 1111  1093,2
16: N.E.G. 1.2 x32                 34.5/92 0000 0000 0000 =000 0000 =000 0000 0000 0000 10=0 0000 00== 0==0 =0=0 01== ···· 1101 ==01 1=11 1=1= =111 1111 1111 1111  839,75
17: Acqua ver. 20160918 x32        34.5/92 0000 0000 0000 0000 0000 0000 0000 0000 1000 0000 0==0 =0=0 0001 1100 1100 0010 ···· 11=1 1101 ==11 1111 1111 1111 1111  809,00
18: Ram 2.0 x32                    30.0/92 0000 =0=0 0000 00=0 0000 00=0 =0=0 0000 00=0 0=00 0000 ==== 0=00 =000 0000 ==10 00=0 ···· ==1= ==== 1111 1111 1111 1111  771,25
19: CPP1 0.1038 x32                22.5/92 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0=00 0=== 000= 0000 0001 0=00 0010 ==0= ···· ==== 1=11 111= 1=11 111=  454,25
20: LaMoSca v0.10 x32              20.0/92 0000 0000 0000 0000 0000 0000 0000 0000 0000 000= 0==0 =000 ==00 0=== ==00 0=0= ==00 ==== ==== ···· ==== ==== ==== ==1=  516,50
21: POS v1.20 x32                  15.0/92 0000 0000 0000 00=0 0000 0000 0000 0000 0000 0000 =00= ==0= 0000 0000 0000 =000 0000 0000 0=00 ==== ···· =1=1 11=1 ===1  319,50
22: EtherTrueRand 9.21 x64         10.5/92 0000 =00= 0000 0000 0000 00=0 0000 0000 0000 0000 0000 0=== 0000 0000 0000 0000 0000 0000 000= ==== =0=0 ···· =0== 1===  289,00
23: EtherealRandom (8.97) x64      9.5/92  0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0=== 0000 0000 0000 0000 0000 0000 0=00 ==== 00=0 =1== ···· 1===  182,50
24: Teki Random Mover x64          9.0/92  0000 0000 0000 0000 0000 0=00 0000 0000 0000 0=00 0000 ===0 0000 0000 0000 0000 0000 0000 000= ==0= ===0 0=== 0=== ····  231,00

1104 games played / Tournament finished

And here's how the rating list currently stands:

 152 Safrad 2.1.35.210 x32                  :  1007.4     208  112   23   73    59    11   825.7    35    27.9
 153 Usurpator II x32                       :   991.9      92   67    9   16    78    10   620.1    23    23.0
 154 Hanzo the Razor x32                    :   970.9      92   50   40    2    76    43   621.1    23    23.0
 155 StrategicDeep 1.25 x32                 :   918.9      58    4    3   51     9     5  1443.0    23    22.5
 156 MFChess 1.3 x32                        :   917.1      92   52   28   12    72    30   623.4    23    23.0
 157 Youk V1.05 x32                         :   916.8     150   61   21   68    48    14   938.2    45    43.4
 158 Hippocampe v0.4.2 x32                  :   901.1     150   98   18   34    71    12   596.8    15    15.0
 159 Zoe 0.1 x32                            :   814.9      92   45   26   21    63    28   627.8    23    23.0
 160 Pyotr Amateur Edition v0.6 x32         :   771.1      92   42   25   25    59    27   629.7    23    23.0
 161 NSVChess 0.14 x32                      :   761.3     242  114   69   59    61    29   615.3    29    25.1
 162 Dikabi v0.4209 x32                     :   714.6      92   21   58   13    54    63   632.2    23    23.0
 163 Easy Peasy 1.0 x32                     :   667.9     242  117   30   95    55    12   620.7    29    25.1
 164 Pyotr Novice Edition v2.6 x32          :   650.6      92   35   20   37    49    22   635.0    23    23.0
 165 Leela Chess Zero Gen 6 x64             :   584.4      92   31   18   43    43    20   637.9    23    23.0
 166 N.E.G. 1.2 x32                         :   511.9     242   89   29  124    43    12   629.7    29    25.1
 167 Acqua ver. 20160918 x32                :   506.4     242   94   17  131    42     7   630.0    29    25.1
 168 Ram 2.0 x32                            :   388.8     242   58   46  138    33    19   636.8    29    25.1
 169 Leela Chess Zero Gen 4 x64             :   369.4     150   43   18   89    35    12   632.3    15    15.0
 170 CPP1 0.1038 x32                        :   323.9     242   45   49  148    29    20   640.6    29    25.1
 171 LaMoSca v0.10 x32                      :   253.3     242    2  111  129    24    46   644.7    29    25.1
 172 POS v1.20 x32                          :   144.0     242   18   45  179    17    19   651.0    29    25.1
 173 EtherealRandom (8.97) x64              :    52.3      92    2   15   75    10    16   661.0    23    23.0
 174 EtherTrueRand 9.21 x64                 :    34.8     242    2   48  192    11    20   657.3    29    25.1
 175 Teki Random Mover x64                  :     0.0     242    0   44  198     9    18   659.3    29    25.1

You can see that Gen 6 is about 215 elo stronger than Gen 4.

I will test Gen 8 later today.

CMCanavessi commented 6 years ago

I have started a gauntlet, Leela Gen 8 vs all 23 engines that Gen 6 played against. 4 rounds, 92 games total. We'll see the real improvement in a couple of hours.

CMCanavessi commented 6 years ago

Leela just beat Pyotr Novice Edition in 16 moves... I'm absolutely impressed with Gen 8. It's playing MUCH better than Gen 6. It looks like it knows what it's doing now. I makes logic moves and plays with some kind of sense. It's difficult to explain.

It still has some trouble with endgames, it will shuffle and shuffle for several movements before mating, even with 5 queens vs lonely king haha.

Error323 commented 6 years ago

Awesome! Can you post perhaps one of the interesting games as gif here?

CMCanavessi commented 6 years ago

How do I do that? I can post the full pgn if needed.

Error323 commented 6 years ago

@kiudee has a nice tool. I think he uses lichess? I used this one http://www.apronus.com/chess/wbeditor.php

kiudee commented 6 years ago

I used the PGN editor on caissa.com for the animations.

killerducky commented 6 years ago

I'd prefer we find a solution that includes pgn files. If someone posts games that need debugging we need pgn to input it into lzchess. http://eidogo.com/ links are the standard for Go, is there not something similar for Chess where you can post links to a game viewer that allows pgn downloads?

CMCanavessi commented 6 years ago

Just for comparison, Gen 6 got 40 points in 92 games in this gauntlet. Gen 8 already has 26.5 points in 46 games.

Estimated elo so far:

   # PLAYER                                 :  RATING  PLAYED    W    D    L   (%)  D(%)  OppAvg  OppN  OppDiv
 163 Usurpator II x32                       :  1040.1     105   73   11   21    75    10   690.9    34    30.0
 164 Safrad 2.1.35.210 x32                  :   998.4     208  112   23   73    59    11   817.4    35    27.9
 165 Hanzo the Razor x32                    :   986.1      94   51   41    2    76    44   633.4    24    23.8
 166 MFChess 1.3 x32                        :   932.7      94   53   29   12    72    31   635.6    24    23.8
 167 Youk V1.05 x32                         :   925.2     152   62   22   68    48    14   933.9    46    44.3
 168 StrategicDeep 1.25 x32                 :   907.6      58    4    3   51     9     5  1423.2    23    22.5
 169 Hippocampe v0.4.2 x32                  :   896.5     150   98   18   34    71    12   592.9    15    15.0
 170 Zoe 0.1 x32                            :   831.6      94   46   27   21    63    29   639.9    24    23.8
 171 Pyotr Amateur Edition v0.6 x32         :   782.1      94   42   27   25    59    29   642.0    24    23.8
 172 Leela Chess Zero Gen 8 x64             :   775.4      46   23    7   16    58    15   647.4    23    23.0
 173 NSVChess 0.14 x32                      :   765.2     244  115   69   60    61    28   617.5    30    25.6
 174 Dikabi v0.4209 x32                     :   732.6      94   22   59   13    55    63   644.2    24    23.8
 175 Easy Peasy 1.0 x32                     :   667.0     244  117   30   97    54    12   623.1    30    25.6
 176 Pyotr Novice Edition v2.6 x32          :   650.4      94   35   20   39    48    21   647.6    24    23.8
 177 Leela Chess Zero Gen 6 x64             :   591.6      92   31   18   43    43    20   647.4    23    23.0
 178 N.E.G. 1.2 x32                         :   511.6     244   89   29  126    42    12   632.0    30    25.6
 179 Acqua ver. 20160918 x32                :   506.2     244   94   17  133    42     7   632.3    30    25.6
 180 Ram 2.0 x32                            :   388.7     244   58   46  140    33    19   639.1    30    25.6
 181 Leela Chess Zero Gen 4 x64             :   369.1     150   43   18   89    35    12   628.0    15    15.0
 182 CPP1 0.1038 x32                        :   323.9     244   45   49  150    28    20   642.8    30    25.6
 183 LaMoSca v0.10 x32                      :   253.3     244    2  111  131    24    45   646.9    30    25.6
 184 POS v1.20 x32                          :   144.0     244   18   45  181    17    18   653.1    30    25.6
 185 EtherealRandom (8.97) x64              :    52.5      94    2   15   77    10    16   673.1    24    23.8
 186 EtherTrueRand 9.21 x64                 :    34.8     244    2   48  194    11    20   659.4    30    25.6
 187 Teki Random Mover x64                  :     0.0     244    0   44  200     9    18   661.4    30    25.6
CMCanavessi commented 6 years ago

At the end of round 3, Gen 8 already has 39.5 points, only 0.5 less than Gen 6 got with 1 more full round played. That's the kind of improvement we got :D

Last round starting now, will post results in a while.

CMCanavessi commented 6 years ago

Here's the finished gauntlet:

-----------------Leela Chess Zero Gen 8 x64-----------------
Leela Chess Zero Gen 8 x64 - Acqua ver. 20160918 x32        : 3,0/4 3-1-0 (1101)  75%  +191
Leela Chess Zero Gen 8 x64 - BRAMA 05/12/2004 x32           : 0,5/4 0-3-1 (000=)  13%  -330
Leela Chess Zero Gen 8 x64 - CPP1 0.1038 x32                : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Dikabi v0.4209 x32             : 1,5/4 1-2-1 (0=10)  38%   -85
Leela Chess Zero Gen 8 x64 - Easy Peasy 1.0 x32             : 3,0/4 3-1-0 (1110)  75%  +191
Leela Chess Zero Gen 8 x64 - EtherealRandom (8.97) x64      : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - EtherTrueRand 9.21 x64         : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Hanzo the Razor x32            : 1,0/4 0-2-2 (=00=)  25%  -191
Leela Chess Zero Gen 8 x64 - Iota 1.0 x32                   : 0,5/4 0-3-1 (00=0)  13%  -330
Leela Chess Zero Gen 8 x64 - LaMoSca v0.10 x32              : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - MFChess 1.3 x32                : 1,0/4 0-2-2 (0=0=)  25%  -191
Leela Chess Zero Gen 8 x64 - N.E.G. 1.2 x32                 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - NSVChess 0.14 x32              : 1,5/4 1-2-1 (01=0)  38%   -85
Leela Chess Zero Gen 8 x64 - POS v1.20 x32                  : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Pyotr Amateur Edition v0.6 x32 : 2,0/4 0-0-4 (====)  50%    ±0
Leela Chess Zero Gen 8 x64 - Pyotr Novice Edition v2.6 x32  : 3,5/4 3-0-1 (11=1)  88%  +346
Leela Chess Zero Gen 8 x64 - Ram 2.0 x32                    : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Talvmenni 0.1 x32              : 0,0/4 0-4-0 (0000)   0% -1200
Leela Chess Zero Gen 8 x64 - Teki Random Mover x64          : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Usurpator II x32               : 0,5/4 0-3-1 (00=0)  13%  -330
Leela Chess Zero Gen 8 x64 - Xadreco 5.83 x32               : 0,0/4 0-4-0 (0000)   0% -1200
Leela Chess Zero Gen 8 x64 - Youk V1.05 x32                 : 1,5/4 1-2-1 (=001)  38%   -85
Leela Chess Zero Gen 8 x64 - Zoe 0.1 x32                    : 2,0/4 1-1-2 (0==1)  50%    ±0

And just for comparison, here's Gen 6

-----------------Leela Chess Zero Gen 6 x64-----------------
Leela Chess Zero Gen 6 x64 - Acqua ver. 20160918 x32           : 2,0/4 2-2-0 (0011)  50%    ±0
Leela Chess Zero Gen 6 x64 - BRAMA 05/12/2004 x32              : 0,0/4 0-4-0 (0000)   0% -1200
Leela Chess Zero Gen 6 x64 - CPP1 0.1038 x32                   : 3,0/4 3-1-0 (1110)  75%  +191
Leela Chess Zero Gen 6 x64 - Dikabi v0.4209 x32                : 2,0/4 0-0-4 (====)  50%    ±0
Leela Chess Zero Gen 6 x64 - Easy Peasy 1.0 x32                : 1,0/4 1-3-0 (0010)  25%  -191
Leela Chess Zero Gen 6 x64 - EtherealRandom (8.97) x64         : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - EtherTrueRand 9.21 x64            : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - Hanzo the Razor x32               : 1,0/4 0-2-2 (0==0)  25%  -191
Leela Chess Zero Gen 6 x64 - Iota 1.0 x32                      : 0,5/4 0-3-1 (00=0)  13%  -330
Leela Chess Zero Gen 6 x64 - LaMoSca v0.10 x32                 : 3,0/4 2-0-2 (==11)  75%  +191
Leela Chess Zero Gen 6 x64 - MFChess 1.3 x32                   : 0,5/4 0-3-1 (0=00)  13%  -330
Leela Chess Zero Gen 6 x64 - N.E.G. 1.2 x32                    : 2,0/4 1-1-2 (10==)  50%    ±0
Leela Chess Zero Gen 6 x64 - NSVChess 0.14 x32                 : 2,0/4 1-1-2 (0==1)  50%    ±0
Leela Chess Zero Gen 6 x64 - POS v1.20 x32                     : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - Pyotr Amateur Edition v0.6 x32    : 0,5/4 0-3-1 (0=00)  13%  -330
Leela Chess Zero Gen 6 x64 - Pyotr Novice Edition v2.6 x32     : 2,0/4 1-1-2 (1==0)  50%    ±0
Leela Chess Zero Gen 6 x64 - Ram 2.0 x32                       : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - Talvmenni 0.1 x32                 : 0,0/4 0-4-0 (0000)   0% -1200
Leela Chess Zero Gen 6 x64 - Teki Random Mover x64             : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - Usurpator II x32                  : 0,0/4 0-4-0 (0000)   0% -1200
Leela Chess Zero Gen 6 x64 - Xadreco 5.83 x32                  : 0,0/4 0-4-0 (0000)   0% -1200
Leela Chess Zero Gen 6 x64 - Youk V1.05 x32                    : 0,0/4 0-4-0 (0000)   0% -1200
Leela Chess Zero Gen 6 x64 - Zoe 0.1 x32                       : 0,5/4 0-3-1 (0=00)  13%  -330

The improvement is pretty evident. Here's the rating list as of now:

 163 Usurpator II x32                       :  1038.3     107   74   12   21    75    11   692.7    34    30.1
 164 Safrad 2.1.35.210 x32                  :   999.9     208  112   23   73    59    11   818.4    35    27.9
 165 Hanzo the Razor x32                    :   985.9      96   52   42    2    76    44   636.7    24    24.0
 166 MFChess 1.3 x32                        :   934.2      96   54   30   12    72    31   638.8    24    24.0
 167 Youk V1.05 x32                         :   922.0     154   63   22   69    48    14   932.5    46    44.4
 168 StrategicDeep 1.25 x32                 :   908.1      58    4    3   51     9     5  1424.0    23    22.5
 169 Hippocampe v0.4.2 x32                  :   898.5     150   98   18   34    71    12   594.0    15    15.0
 170 Zoe 0.1 x32                            :   824.3      96   46   28   22    63    29   643.4    24    24.0
 171 Pyotr Amateur Edition v0.6 x32         :   782.6      96   42   29   25    59    30   645.2    24    24.0
 172 Leela Chess Zero Gen 8 x64             :   782.4      92   45   17   30    58    18   647.5    23    23.0
 173 NSVChess 0.14 x32                      :   769.6     246  116   70   60    61    28   619.4    30    25.9
 174 Dikabi v0.4209 x32                     :   734.8      96   23   59   14    55    61   647.2    24    24.0
 175 Easy Peasy 1.0 x32                     :   669.9     246  118   30   98    54    12   625.1    30    25.9
 176 Pyotr Novice Edition v2.6 x32          :   649.2      96   35   21   40    47    22   650.7    24    24.0
 177 Leela Chess Zero Gen 6 x64             :   592.0      92   31   18   43    43    20   647.5    23    23.0
 178 N.E.G. 1.2 x32                         :   510.7     246   89   29  128    42    12   634.1    30    25.9
 179 Acqua ver. 20160918 x32                :   510.7     246   95   17  134    42     7   634.1    30    25.9
 180 Ram 2.0 x32                            :   388.4     246   58   46  142    33    19   641.1    30    25.9
 181 Leela Chess Zero Gen 4 x64             :   369.9     150   43   18   89    35    12   629.2    15    15.0
 182 CPP1 0.1038 x32                        :   323.7     246   45   49  152    28    20   644.8    30    25.9
 183 LaMoSca v0.10 x32                      :   253.2     246    2  111  133    23    45   648.8    30    25.9
 184 POS v1.20 x32                          :   144.0     246   18   45  183    16    18   655.0    30    25.9
 185 EtherealRandom (8.97) x64              :    51.9      96    2   15   79    10    16   675.6    24    24.0
 186 EtherTrueRand 9.21 x64                 :    34.8     246    2   48  196    11    20   661.2    30    25.9
 187 Teki Random Mover x64                  :     0.0     246    0   44  202     9    18   663.2    30    25.9

+190 from Gen 6 to Gen 8, i think it's pretty good.

jkiliani commented 6 years ago

Just finished the next match against Stockfish Level 0:

Score of lc_gen9 vs sf_lv0: 42 - 57 - 1  [0.425] 100
Elo difference: -52.51 +/- 69.39

This is an improvement of 59 Elo compared to gen8 (https://github.com/glinscott/leela-chess/issues/100#issuecomment-373554840) and 148 Elo compared to gen7 (https://github.com/glinscott/leela-chess/issues/100#issuecomment-372963434), using Stockfish Level 0 as a metric. So there is a steady improvement, just at a rate less than the self-play Elo which is to be expected.

JackThomson2 commented 6 years ago

Interestingly for me Leela Gen 9 has no problem beating level 0 stockfish, what settings are you using for that?

jkiliani commented 6 years ago

800 playouts, and Dirichlet noise. I know it can beat SF Level 0 with more playouts, but I want to keep the metric constant.

CMCanavessi commented 6 years ago

I think the tests we do should be without noise. Noise is good for self-training, cause it may lead to a new, better move that it will learn from, but for tournaments and elo testing, we should disable noise imho, we want the strongest version of the engine playing those games.

Uriopass commented 6 years ago

Yes but the ELO shown in the main page of http://lczero.org/ must use thoses 800 rollouts.. So if we want to compare this graph with "real" ELO we need it to be in the same conditions.

jkiliani commented 6 years ago

@CMCanavessi The problem with not using noise currently is determinism. Until LCZero has random symmetries applied for every neural net evaluation, it will currently play deterministically if you use neither Dirichlet noise nor temperature=1, i.e. proportional move selection. On some systems OpenCL errors remove the deterministic behaviour, but on mine it doesn't since I use CPU. You can easily test that Dirichlet noise affects playing strength only in a very minor way (it might not even do so at all yet since policy priors are still weak), while temperature=1 vastly lowers playing strength.

So until we have a better way to ensure variance in games played (I also opened https://github.com/glinscott/leela-chess/issues/67 for this purpose), keeping Dirichlet noise on always is our best bet.

If you want to see for yourself, just do a cutechess-cli match of two LCZero nets against each other, without OpenCL. They will repeat the same two games over and over.

tooweaktooslow commented 6 years ago

I ran some noise testing earlier today, and it doesn't seem to affect strength too much. (1k playouts)

Score of LeelaChess gen9 1k vs LeelaChess gen9 1k noise: 46 - 36 - 18 [0.555]
Elo difference: 34.86 +/- 62.50

100 of 100 games finished.
jkiliani commented 6 years ago

I had tested the engine with noise against itself with no noise for a very early net, and found also no effect then, but I repeated the experiment on your results. Mine look very similar:

Score of lc_gen9 vs lc_gen9n: 49 - 38 - 13  [0.555] 100
Elo difference: 38.37 +/- 64.51

It's a pity there's no reliable way to enforce variation without weakening the engine... I could probably get away with not using noise against Stockfish, but any match between lczero with different nets would still require it. Maybe once symmetries are implemented, we can retire using Dirichlet noise for evaluation matches.

jkiliani commented 6 years ago

When matching the new net against Stockfish (Lv 0), I didn't find a regression but a very slight improvement compared to gen9:

Score of lc_gen10 vs sf_lv0: 44 - 56 - 0  [0.440] 100
Elo difference: -41.89 +/- 69.44

@Error323 What was the actual match result of gen10 vs gen9?

Error323 commented 6 years ago
Score of lc_gen10 vs lc_gen9: 43 - 53 - 4  [0.450] 100
Elo difference: -34.86 +/- 67.82
Finished match

Only difference is V2 samples are in the mix. And they have been verified thoroughly, BUT the movecount only goes up to 255 as it's now an unsigned int8.

CMCanavessi commented 6 years ago

I'm about to start the usual gauntlet that I run vs 23 other engines. Will inform results later.

jkiliani commented 6 years ago

Either way, if the next net is trained on gen8, gen9, and gen10 games, it would have a large sample of very similar strength training data which should allow it to generalise successfully.

So with V2, any games above 255 ply are adjudicated as draw? Or do they simply keep a move count of 255 at every ply beyond that?

Error323 commented 6 years ago

They keep the same move count. I think the net should not use it as input really... We have 8 history planes for 3fold and a 50 move counter input for the 50 move rule.

It's only producing noise now and could be the reason for the drop in strength with self. Maybe we should set it to always 0?

jkiliani commented 6 years ago

I don't see a good reason why not... 3-fold and 50 move counter should be enough. The only possible use I can think of for feeding move count to the net as input is to recognise when games are truncated, but at 450 ply, that happens way too rarely for these adjudicated draws to have any effect on training.

Can anyone else here think of a good reason why the training data needs to include move count?

CMCanavessi commented 6 years ago

Ok so I didn't test Gen 9, so I'm comparing to Gen 8 but from what I'm seeing right now, Gen 10 is a definite (can't say "big" yet) improvement. It's already getting draws and wins vs engines that it never managed to before. We'll see what the raw numbers say in a while.

CMCanavessi commented 6 years ago

Round 1 of 4 completed:

Gen 8 got 12.5 points out of 23 / 11-3-9 WDL Gen 10 got 14 points out of 23 (with 2 wins vs engines that hadn't beaten before) / 13-2-8 WDL

Calculated rating so far:

 189 Leela Chess Zero Gen 10 x64            :   789.5      23   13    2    8    61     9   652.0    23    23.0
 190 Leela Chess Zero Gen 8 x64             :   787.6      92   45   17   30    58    18   652.0    23    23.0
jkiliani commented 6 years ago

Finally, f393628a becomes the first net to beat Stockfish Level 0 with 800 playouts and noise, and it does so by a significant margin:

Score of lc_f393628a vs sf_lv0: 71 - 28 - 1  [0.715] 100
Elo difference: 159.78 +/- 76.76

The Elo difference to the match with gen9 (5c8d14d5) is actually larger than what the direct match by @Error323 yielded. I think there is a good chance that this net would also do very well in @CMCanavessi's tournament, and it looks like at least tentative evidence that a 200k chunk window works well.

I am planning to continue these matches with Stockfish Level 0 until LCZero manages a 85%-90% winrate, and then switch to Level 5 as reference. I think Level 5 will be a good choice since I earlier tested (https://github.com/glinscott/leela-chess/issues/109#issuecomment-372701765) that the supervised net kbb1-64x6-796000.txt is roughly comparable to SF Level 5-6.

Also, we should have clarity in how we refer to networks: Do we continue to call them genxx by the order in which they were promoted to best network, or by their hash? The latter would allow easier reference to candidate nets that were never promoted, but genxx is more intuitive in a way.

kiudee commented 6 years ago

@jkiliani Let us switch to hashes. The problem is that for newcomers the generation is written nowhere on the website, which makes it confusing.

Error323 commented 6 years ago

Excellent! And indeed the 200K window is now the standard! I also trained a new version last night afterwards with a 100K window, but the MSE on the testset was much higher, indicating overfitting. So nice work @jkiliani :+1:

It's interesting how it suddenly happened to overfit so badly. Something to think about...

about the networks: I'm now calling them by their sha256sum.

jkiliani commented 6 years ago

So, the Leela Zero tradition it is. But could you truncate the hash to 8 characters for the purpose of naming the files directly downloaded from http://lczero.org/networks? This should suffice for uniqueness, and anything longer than 8 chars becomes really cumbersome. I think even 6 chars would probably suffice to not confuse networks...

Error323 commented 6 years ago

Personally I'd also like 6 chars. It's memorizable and probably sufficient. I'll discuss with @glinscott

jkiliani commented 6 years ago

About the training window, Leela Zero went the opposite direction: In the beginning, we used 500k games since that was the value from the AlphaZero paper, even though a much smaller window would almost certainly have been better in the beginning. Later @gcp reduced it to 250k games when it was becoming obvious that the large window obstructed progress in the beginning. Only now, with a very strong and large network, and slow progress, is enlarging the window being discussed again.

I think the 500k from Deepmind must have been picked mainly for the late training phase.