Closed jjoshua2 closed 6 years ago
I tried to load lczero as UCI engine to match some of these and couldn't get it to work, even when I hardcoded the -w weights file, since GUIs don't let you send params. Maybe it takes too long to initialize or would work in other gui?
Which GUI are you using? The cutechess gui should let you pass command line parameters.
I was using infinity chess gui. I can try cutechess I suppose. Chessbase also don't support command line. I think the arguments really should be uci parameters. Especially threads.
On Mar 12, 2018 11:47 PM, "Gary Linscott" notifications@github.com wrote:
Which GUI are you using? The cutechess gui should let you pass command line parameters.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/glinscott/leela-chess/issues/109#issuecomment-372537508, or mute the thread https://github.com/notifications/unsubscribe-auth/AO6INOvEzFdSqVs62Kb8foTJUFesckZjks5td0E0gaJpZM4SnVT3 .
009 - Qualy League Ratings.txt 009 - Qualy League Ratings Head2head.txt
I will test Gen 5 in a couple of days.
Once it starts getting stronger, agreed, UCI parameters make sense. Or if someone else sends a PR I'd be happy to merge it :).
I would be happy with at least some Time Control management. I wanted to make some 30''+0.5'' tournaments but it was constantly losing on time, as it takes around 1.7 seconds to move in my box, using default 800 playouts.
I did a number of round-robin tournaments with Stockfish at different Skill levels and constant time control 1 sec / 40 moves, as well as with (slightly) longer time controls, to get an idea on its scaling. Here are my results:
Rank Name Elo +/- Games Score Draws
1 sf 861 136 1000 99.3% 0.6%
2 sf10 175 24 1000 73.2% 2.6%
3 sf8 114 22 1000 65.8% 2.4%
4 sf5 -90 22 1000 37.4% 1.9%
5 sf3 -258 27 1000 18.4% 1.5%
6 sf1 -481 46 1000 5.9% 0.4%
sf meaning Stockfish without Skill level setting, all engines 40/1
Rank Name Elo +/- Games Score Draws
1 sf20 531 114 200 95.5% 4.0%
2 sf 449 93 200 93.0% 4.0%
3 sf17 173 51 200 73.0% 10.0%
4 sf16 160 52 200 71.5% 5.0%
5 sf19 149 51 200 70.3% 7.5%
6 sf18 149 51 200 70.3% 7.5%
7 sf15 106 49 200 64.8% 7.5%
8 sf13 89 47 200 62.5% 10.0%
9 sf14 70 47 200 60.0% 9.0%
10 sf12 63 45 200 59.0% 14.0%
11 sf11 47 46 200 56.8% 10.5%
12 sf9 42 46 200 56.0% 11.0%
13 sf10 37 47 200 55.3% 8.5%
14 sf8 -12 46 200 48.3% 8.5%
15 sf7 -133 51 200 31.8% 3.5%
16 sf6 -166 53 200 27.8% 4.5%
17 sf5 -246 62 200 19.5% 1.0%
18 sf4 -279 65 200 16.8% 1.5%
19 sf3 -382 83 200 10.0% 1.0%
20 sf2 -470 109 200 6.3% 0.5%
21 sf1 -676 363 200 2.0% 0.0%
Dito, it seems Skill level=20 is equivalent to not setting a skill level?
Rank Name Elo +/- Games Score Draws
1 sf9 492 70 450 94.4% 1.8%
2 sf8 287 43 450 83.9% 3.8%
3 sf7 226 39 450 78.6% 2.4%
4 sf6 112 33 450 65.6% 4.0%
5 sf5 39 32 450 55.6% 3.1%
6 sf4 -31 32 450 45.6% 1.8%
7 sf3 -140 34 450 30.9% 2.7%
8 sf2 -241 40 450 20.0% 2.2%
9 sf1 -303 45 450 14.9% 1.3%
10 sf0 -369 52 450 10.7% 0.9%
I wasn't aware that 0 was a valid setting for Skill level until then, but tried it at this point. All games 1 sec / 40 moves.
Rank Name Elo +/- Games Score Draws
1 sf_40/16 350 53 200 88.3% 21.5%
2 sf_40/8 184 43 200 74.3% 30.5%
3 sf_40/4 12 42 200 51.7% 25.5%
4 sf_40/2 -151 44 200 29.5% 25.0%
5 sf_40/1 -470 84 200 6.3% 8.5%
And finally, a time scaling test that revealed very considerable scaling at such short time controls.
I just measured @Error323's supervised net kbb1-64x6-796000.txt against sf5, i.e.
./cutechess-cli -rounds 70 -tournament gauntlet -concurrency 2 -pgnout SF0.pgn \
-engine name=lc_kbb1 cmd=lczero arg="--threads=1" arg="--weights=$WDR/kbb1-64x6-796000.txt" arg="--playouts=800" arg="--noponder" arg="--noise" tc=inf \
-engine name=sf_lv5 cmd=stockfish_x86-64 option.Threads=1 option."Skill Level"=5 tc=40/1 \
-each proto=uci
The result:
Score of lc_kbb1 vs sf_lv5: 39 - 29 - 2 [0.571] 70
Elo difference: 49.98 +/- 82.52
From the previous tests, sf5 should be roughly 450 Elo above sf0. So far, I had none of the reinforcement learning nets scoring any wins or draws against sf0 yet, but I'm going to run another match tonight with gen6 to test whether that changes today. I'll update when I have something.
@jkiliani very cool, thanks for the numbers. At some point, we can do a round-robin tournament and get a better idea of the overall ELO progression as well.
I think @CMCanavessi tournament is more interesting using engines of similar strength. Crippling a strong engine doesn't make as much sense to me, but I am always excited to see both, so keep them coming! SF skill level increasing is easier and more efficient to test than against many engines though.
I can't wait to see how gen6 does. I think it will be about 600 elo. I estimated gen4 was 530 between Acqua and NEG on CCRL 40/4. Although seeing 412 with a 0 elo random mover was good too. CCRL base is Brutus RND at 200 elo.
Running a tournament right now with 24 engines under ~1000 elo, Leela Chess Zero Gen 6 is playing. Will update later with how it's doing. Should do somewhat better than Gen 4, but can it beat NEG and Acqua consistently now? Can it at least draw with Easy Peasy? We'll see...
@CMCanavessi you can reuse most of the engine v engine games from prior tournaments and just rerun LCZ right? is that the strategy you are doing?
Yes, you can do that but I'm just running a completely new tournament, with more rivals. Looking good so far.
In the previous tournament, LCZ Gen 4 played 10 games vs. NEG and got 0 wins, 1 draw and 9 loses. In this tournament it has already played once vs NEG and it won. And seeing it play, it looks to have much better endgame undesrtanding. We'll see how it looks when more games are played, so far it's at 50%.
Did some research and it looks like NEG and Acqua both just do a 1 ply search with no lookahead, so it should be easy for a neural network + any search to beat them once the net understands the very basics. It may be interesting if you could put an SF with skill=0 in your tournament. I think it might be a 1 ply plus quiescent search? I think I saw skill=1 was 3 ply.
Edit: it appears skill level has maxDepth of level + 1, but also randomly picks among the top 4 moves as long as its not a major blunder.
1st round robin out of 10 has been played and Gen 6 is showing better performance compared to Gen 4. Here are the current standings:
Engine Score BRXaUsIoTaHaMFNSYoZoPyDiPyAcN.LeEaRaLaCPPOEtEtTe S-B
01: BRAMA 05/12/2004 x32 20.5/23 · 0 1 1 0 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 209,25
02: Xadreco 5.83 x32 19.5/23 1 · 0 = 1 = 1 = = 1 1 = 1 1 1 1 1 1 1 1 1 1 1 1 198,50
03: Usurpator II x32 19.0/23 0 1 · 1 1 = 0 1 1 0 1 = 1 1 1 1 1 1 1 1 1 1 1 1 190,75
04: Iota 1.0 x32 18.5/23 0 = 0 · = 1 = 1 1 1 1 1 1 1 1 1 1 = 1 1 1 = 1 1 185,50
05: Talvmenni 0.1 x32 18.0/23 1 0 0 = · = = = 1 = 1 1 1 1 = 1 1 1 1 1 1 1 1 1 173,75
06: Hanzo the Razor x32 17.5/23 = = = 0 = · = 1 1 = = = 1 1 = 1 1 1 1 1 1 1 1 1 168,50
07: MFChess 1.3 x32 17.0/23 0 0 1 = = = · = = = 1 1 1 1 1 1 = = 1 1 1 1 1 1 162,00
08: NSVChess 0.14 x32 15.0/23 0 = 0 0 = 0 = · = 1 = = = 1 1 1 1 1 1 1 = 1 1 1 130,25
09: Youk V1.05 x32 15.0/23 0 = 0 0 0 0 = = · = = = 1 1 1 1 1 1 1 1 1 1 1 1 122,00
10: Zoe 0.1 x32 14.0/23 0 0 1 0 = = = 0 = · 0 = = 0 1 1 1 1 1 1 1 1 1 1 119,50
11: Pyotr Amateur Edition v0.6 x32 13.5/23 0 0 0 0 0 = 0 = = 1 · = = 1 0 1 1 1 1 1 1 1 1 1 105,00
12: Dikabi v0.4209 x32 12.0/23 0 = = 0 0 = 0 = = = = · = = 1 = 1 = = 1 = 1 1 = 109,00
13: Pyotr Novice Edition v2.6 x32 11.5/23 0 0 0 0 0 0 0 = 0 = = = · 1 1 0 1 1 = 1 1 1 1 1 80,75
14: Acqua ver. 20160918 x32 10.0/23 0 0 0 0 0 0 0 0 0 1 0 = 0 · 0 1 1 1 = 1 1 1 1 1 63,00
15: N.E.G. 1.2 x32 9.5/23 0 0 0 0 = = 0 0 0 0 1 0 0 1 · 0 = = 1 1 = 1 1 1 67,75
16: Leela Chess Zero Gen 6 x64 9.0/23 0 0 0 0 0 0 0 0 0 0 0 = 1 0 1 · 0 1 = 1 1 1 1 1 52,50
17: Easy Peasy 1.0 x32 8.5/23 0 0 0 0 0 0 = 0 0 0 0 0 0 0 = 1 · = 1 1 1 1 1 1 46,50
18: Ram 2.0 x32 7.5/23 0 0 0 = 0 0 = 0 0 0 0 = 0 0 = 0 = · = = 1 1 1 1 48,25
19: LaMoSca v0.10 x32 5.0/23 0 0 0 0 0 0 0 0 0 0 0 = = = 0 = 0 = · = = = = = 32,75
20: CPP1 0.1038 x32 5.0/23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = = · 1 1 1 1 16,75
21: POS v1.20 x32 4.0/23 0 0 0 0 0 0 0 = 0 0 0 = 0 0 = 0 0 0 = 0 · = 1 = 25,00
22: EtherTrueRand 9.21 x64 3.0/23 0 0 0 = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 0 = · = 1 16,25
23: EtherealRandom (8.97) x64 2.0/23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = 0 0 = · 1 5,50
24: Teki Random Mover x64 1.5/23 0 0 0 0 0 0 0 0 0 0 0 = 0 0 0 0 0 0 = 0 = 0 0 · 10,50
276 of 2760 games played
And here's how the bottom of my rating list looks like right now:
156 MFChess 1.3 x32 : 1001.0 23 13 8 2 74 35 661.7 23 23.0
157 Hippocampe v0.4.2 x32 : 982.0 150 98 18 34 71 12 652.1 15 15.0
158 Youk V1.05 x32 : 958.4 62 16 7 39 31 11 1204.8 45 42.0
159 NSVChess 0.14 x32 : 860.7 173 92 39 42 64 23 661.2 29 19.7
160 Zoe 0.1 x32 : 833.9 23 11 6 6 61 26 669.0 23 23.0
161 Pyotr Amateur Edition v0.6 x32 : 814.4 23 11 5 7 59 22 669.8 23 23.0
162 Easy Peasy 1.0 x32 : 756.6 173 92 16 65 58 9 667.8 29 19.7
163 Dikabi v0.4209 x32 : 737.3 23 5 14 4 52 61 673.2 23 23.0
164 Pyotr Novice Edition v2.6 x32 : 708.6 23 9 5 9 50 22 674.4 23 23.0
165 N.E.G. 1.2 x32 : 574.4 173 70 17 86 45 10 679.4 29 19.7
166 Acqua ver. 20160918 x32 : 569.7 173 72 12 89 45 7 679.7 29 19.7
167 Leela Chess Zero Gen 6 x64 : 568.9 23 8 2 13 39 9 680.5 23 23.0
168 Leela Chess Zero Gen 4 x64 : 413.8 150 43 18 89 35 12 690.0 15 15.0
169 Ram 2.0 x32 : 413.6 173 44 29 100 34 17 689.6 29 19.7
170 CPP1 0.1038 x32 : 361.6 173 35 34 104 30 20 692.9 29 19.7
171 LaMoSca v0.10 x32 : 282.3 173 1 83 89 25 48 698.0 29 19.7
172 POS v1.20 x32 : 165.7 173 13 33 127 17 19 705.4 29 19.7
173 EtherTrueRand 9.21 x64 : 50.6 173 2 33 138 11 19 712.7 29 19.7
174 EtherealRandom (8.97) x64 : 35.5 23 1 2 20 9 9 703.7 23 23.0
175 Teki Random Mover x64 : 0.0 173 0 29 144 8 17 715.9 29 19.7
Too early to quantify the gain, but Gen6 is clearly stronger than Gen4. We'll see tomorrow when a couple more rounds are played.
Leela Gen 6 has played 55 games now in the new tournament, and things look much better than Gen 4. here's the updated ratings from the bottom of my rating list:
152 Usurpator II x32 : 1019.6 55 40 5 10 77 9 653.7 23 22.5
153 Talvmenni 0.1 x32 : 998.7 55 34 16 5 76 29 649.2 23 22.5
154 StrategicDeep 1.25 x32 : 989.6 39 3 2 34 10 5 1501.8 23 22.1
155 Hanzo the Razor x32 : 981.9 55 30 24 1 76 44 626.8 23 22.5
156 MFChess 1.3 x32 : 954.1 55 31 17 7 72 31 653.0 23 22.5
157 Hippocampe v0.4.2 x32 : 933.4 150 98 18 34 71 12 618.0 15 15.0
158 Youk V1.05 x32 : 918.2 94 38 10 46 46 11 975.0 45 42.8
159 Zoe 0.1 x32 : 818.4 55 28 14 13 64 25 628.8 23 22.5
160 NSVChess 0.14 x32 : 800.5 205 103 54 48 63 26 626.3 29 22.7
161 Pyotr Amateur Edition v0.6 x32 : 787.7 55 26 16 13 62 29 616.2 23 22.5
162 Dikabi v0.4209 x32 : 740.6 55 14 34 7 56 62 633.3 23 22.5
163 Easy Peasy 1.0 x32 : 683.5 205 102 22 81 55 11 636.9 29 23.1
164 Pyotr Novice Edition v2.6 x32 : 613.6 55 19 11 25 45 20 654.0 23 22.5
165 Leela Chess Zero Gen 6 x64 : 587.8 55 18 12 25 44 22 638.3 23 22.5
166 N.E.G. 1.2 x32 : 532.5 205 77 24 104 43 12 652.7 29 23.6
167 Acqua ver. 20160918 x32 : 527.7 205 82 15 108 44 7 646.2 29 23.1
168 Ram 2.0 x32 : 391.8 205 50 38 117 34 19 650.8 29 22.5
169 Leela Chess Zero Gen 4 x64 : 383.9 150 43 18 89 35 12 654.6 15 15.0
170 CPP1 0.1038 x32 : 331.7 205 39 43 123 30 21 651.9 29 22.9
171 LaMoSca v0.10 x32 : 271.2 205 2 99 104 25 48 658.1 29 22.7
172 POS v1.20 x32 : 153.2 205 15 39 151 17 19 674.6 29 23.4
173 EtherealRandom (8.97) x64 : 65.7 55 2 8 45 11 15 656.3 23 22.5
174 EtherTrueRand 9.21 x64 : 40.1 205 2 40 163 11 20 677.8 29 23.2
175 Teki Random Mover x64 : 0.0 205 0 36 169 9 18 675.2 29 22.7
Now we can start to see those +200 elo stronger showing. After this tournament is finished (still many games to go) I'll probably wait for Gen 10 or something like that to make a new test.
Thanks! I would appreciate even if you could just run a gauntlet of Gen 7 against Easy Peasy Pyotr and Neg. There is debate about how inflated the +200 elo in self play is with it not seeming to gain much yet against SF level 0.
Gen 8 :)
Here are the final standings of the tournament with Gen 6
Engine Score BR Io Xa Ta Us Ha MF Yo Zo Py NS Di Py Ea Le N. Ac Ra CP La PO Et Et Te S-B
01: BRAMA 05/12/2004 x32 81.5/92 ···· 1==1 01=1 0==0 111= ==1= 1==1 1111 1111 11=1 111= 1=1= 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 3370,2
02: Iota 1.0 x32 76.0/92 0==0 ···· =101 ===0 0=01 1=== =1=1 1111 1=1= 1=11 1111 1111 1111 1111 11=1 1111 1111 =1=1 1111 1111 1111 =11= 1111 1111 3075,0
03: Xadreco 5.83 x32 75.5/92 10=0 =010 ···· 11=0 0101 ==0= 1111 =1=0 1111 1011 =11= ==1= 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 2981,5
04: Talvmenni 0.1 x32 73.5/92 1==1 ===1 00=1 ···· 001= ==== ==0= 111= ==1= 1111 ===1 1=1= 1111 111= 1111 =111 1111 11=1 1111 1111 11=1 1111 1111 1111 2946,7
05: Usurpator II x32 71.5/92 000= 1=10 1010 110= ···· =0=0 0001 1110 0101 1110 1=11 ===1 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 2725,5
06: Hanzo the Razor x32 70.0/92 ==0= 0=== ==1= ==== =1=1 ···· ===1 111= ==== =1== 1=1= ==== 1111 11== 1==1 =111 1111 11=1 1111 1111 1111 11=1 1111 1=11 2807,0
07: MFChess 1.3 x32 66.0/92 0==0 =0=0 0000 ==1= 1110 ===0 ···· =0== =1== 1=11 ==== 10== 11=1 =111 1=11 1111 1111 =1=1 1111 1111 1111 1111 1111 1111 2448,5
08: Youk V1.05 x32 64.5/92 0000 0000 =0=1 000= 0001 000= =1== ···· =11= ===1 =11= ==1= 111= 11=1 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 2251,7
09: Zoe 0.1 x32 58.0/92 0000 0=0= 0000 ==0= 1010 ==== =0== =00= ···· 0110 0=== =01= =1== 11== 1=11 1111 0111 11=1 1111 1111 1111 1111 1111 1111 1986,2
10: Pyotr Amateur Edition v0.6 x32 54.5/92 00=0 0=00 0100 0000 0001 =0== 0=00 ===0 1001 ···· ==== ==== =1=0 11=1 1=11 01=1 1111 1=11 1111 111= 1111 1111 1111 1=11 1813,0
11: NSVChess 0.14 x32 52.0/92 000= 0000 =00= ===0 0=00 0=0= ==== =00= 1=== ==== ···· ==0= ===0 1=10 1==0 1111 1==1 1111 1=11 1==1 =11= 1111 1111 1111 1767,7
12: Dikabi v0.4209 x32 50.0/92 0=0= 0000 ==0= 0=0= ===0 ==== 01== ==0= =10= ==== ==1= ···· =111 1=11 ==== 11== =1=1 ==== 1=== =111 ==1= 1=== 1=== ===1 1981,7
13: Pyotr Novice Edition v2.6 x32 45.0/92 0000 0000 0000 0000 0000 0000 00=0 000= =0== =0=1 ===1 =000 ···· 11=1 0==1 1==1 1110 1=11 111= ==11 1111 1111 1111 1111 1264,7
14: Easy Peasy 1.0 x32 40.5/92 0000 0000 0000 000= 0000 00== =000 00=0 00== 00=0 0=01 0=00 00=0 ···· 1101 =1=1 0011 =111 1111 1=== 1111 1111 1111 1111 1096,2
15: Leela Chess Zero Gen 6 x64 40.0/92 0000 00=0 0000 0000 0000 0==0 0=00 0000 0=00 0=00 0==1 ==== 1==0 0010 ···· 10== 0011 1111 1110 ==11 1111 1111 1111 1111 1093,2
16: N.E.G. 1.2 x32 34.5/92 0000 0000 0000 =000 0000 =000 0000 0000 0000 10=0 0000 00== 0==0 =0=0 01== ···· 1101 ==01 1=11 1=1= =111 1111 1111 1111 839,75
17: Acqua ver. 20160918 x32 34.5/92 0000 0000 0000 0000 0000 0000 0000 0000 1000 0000 0==0 =0=0 0001 1100 1100 0010 ···· 11=1 1101 ==11 1111 1111 1111 1111 809,00
18: Ram 2.0 x32 30.0/92 0000 =0=0 0000 00=0 0000 00=0 =0=0 0000 00=0 0=00 0000 ==== 0=00 =000 0000 ==10 00=0 ···· ==1= ==== 1111 1111 1111 1111 771,25
19: CPP1 0.1038 x32 22.5/92 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0=00 0=== 000= 0000 0001 0=00 0010 ==0= ···· ==== 1=11 111= 1=11 111= 454,25
20: LaMoSca v0.10 x32 20.0/92 0000 0000 0000 0000 0000 0000 0000 0000 0000 000= 0==0 =000 ==00 0=== ==00 0=0= ==00 ==== ==== ···· ==== ==== ==== ==1= 516,50
21: POS v1.20 x32 15.0/92 0000 0000 0000 00=0 0000 0000 0000 0000 0000 0000 =00= ==0= 0000 0000 0000 =000 0000 0000 0=00 ==== ···· =1=1 11=1 ===1 319,50
22: EtherTrueRand 9.21 x64 10.5/92 0000 =00= 0000 0000 0000 00=0 0000 0000 0000 0000 0000 0=== 0000 0000 0000 0000 0000 0000 000= ==== =0=0 ···· =0== 1=== 289,00
23: EtherealRandom (8.97) x64 9.5/92 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0=== 0000 0000 0000 0000 0000 0000 0=00 ==== 00=0 =1== ···· 1=== 182,50
24: Teki Random Mover x64 9.0/92 0000 0000 0000 0000 0000 0=00 0000 0000 0000 0=00 0000 ===0 0000 0000 0000 0000 0000 0000 000= ==0= ===0 0=== 0=== ···· 231,00
1104 games played / Tournament finished
And here's how the rating list currently stands:
152 Safrad 2.1.35.210 x32 : 1007.4 208 112 23 73 59 11 825.7 35 27.9
153 Usurpator II x32 : 991.9 92 67 9 16 78 10 620.1 23 23.0
154 Hanzo the Razor x32 : 970.9 92 50 40 2 76 43 621.1 23 23.0
155 StrategicDeep 1.25 x32 : 918.9 58 4 3 51 9 5 1443.0 23 22.5
156 MFChess 1.3 x32 : 917.1 92 52 28 12 72 30 623.4 23 23.0
157 Youk V1.05 x32 : 916.8 150 61 21 68 48 14 938.2 45 43.4
158 Hippocampe v0.4.2 x32 : 901.1 150 98 18 34 71 12 596.8 15 15.0
159 Zoe 0.1 x32 : 814.9 92 45 26 21 63 28 627.8 23 23.0
160 Pyotr Amateur Edition v0.6 x32 : 771.1 92 42 25 25 59 27 629.7 23 23.0
161 NSVChess 0.14 x32 : 761.3 242 114 69 59 61 29 615.3 29 25.1
162 Dikabi v0.4209 x32 : 714.6 92 21 58 13 54 63 632.2 23 23.0
163 Easy Peasy 1.0 x32 : 667.9 242 117 30 95 55 12 620.7 29 25.1
164 Pyotr Novice Edition v2.6 x32 : 650.6 92 35 20 37 49 22 635.0 23 23.0
165 Leela Chess Zero Gen 6 x64 : 584.4 92 31 18 43 43 20 637.9 23 23.0
166 N.E.G. 1.2 x32 : 511.9 242 89 29 124 43 12 629.7 29 25.1
167 Acqua ver. 20160918 x32 : 506.4 242 94 17 131 42 7 630.0 29 25.1
168 Ram 2.0 x32 : 388.8 242 58 46 138 33 19 636.8 29 25.1
169 Leela Chess Zero Gen 4 x64 : 369.4 150 43 18 89 35 12 632.3 15 15.0
170 CPP1 0.1038 x32 : 323.9 242 45 49 148 29 20 640.6 29 25.1
171 LaMoSca v0.10 x32 : 253.3 242 2 111 129 24 46 644.7 29 25.1
172 POS v1.20 x32 : 144.0 242 18 45 179 17 19 651.0 29 25.1
173 EtherealRandom (8.97) x64 : 52.3 92 2 15 75 10 16 661.0 23 23.0
174 EtherTrueRand 9.21 x64 : 34.8 242 2 48 192 11 20 657.3 29 25.1
175 Teki Random Mover x64 : 0.0 242 0 44 198 9 18 659.3 29 25.1
You can see that Gen 6 is about 215 elo stronger than Gen 4.
I will test Gen 8 later today.
I have started a gauntlet, Leela Gen 8 vs all 23 engines that Gen 6 played against. 4 rounds, 92 games total. We'll see the real improvement in a couple of hours.
Leela just beat Pyotr Novice Edition in 16 moves... I'm absolutely impressed with Gen 8. It's playing MUCH better than Gen 6. It looks like it knows what it's doing now. I makes logic moves and plays with some kind of sense. It's difficult to explain.
It still has some trouble with endgames, it will shuffle and shuffle for several movements before mating, even with 5 queens vs lonely king haha.
Awesome! Can you post perhaps one of the interesting games as gif here?
How do I do that? I can post the full pgn if needed.
@kiudee has a nice tool. I think he uses lichess? I used this one http://www.apronus.com/chess/wbeditor.php
I used the PGN editor on caissa.com for the animations.
I'd prefer we find a solution that includes pgn files. If someone posts games that need debugging we need pgn to input it into lzchess. http://eidogo.com/ links are the standard for Go, is there not something similar for Chess where you can post links to a game viewer that allows pgn downloads?
Just for comparison, Gen 6 got 40 points in 92 games in this gauntlet. Gen 8 already has 26.5 points in 46 games.
Estimated elo so far:
# PLAYER : RATING PLAYED W D L (%) D(%) OppAvg OppN OppDiv
163 Usurpator II x32 : 1040.1 105 73 11 21 75 10 690.9 34 30.0
164 Safrad 2.1.35.210 x32 : 998.4 208 112 23 73 59 11 817.4 35 27.9
165 Hanzo the Razor x32 : 986.1 94 51 41 2 76 44 633.4 24 23.8
166 MFChess 1.3 x32 : 932.7 94 53 29 12 72 31 635.6 24 23.8
167 Youk V1.05 x32 : 925.2 152 62 22 68 48 14 933.9 46 44.3
168 StrategicDeep 1.25 x32 : 907.6 58 4 3 51 9 5 1423.2 23 22.5
169 Hippocampe v0.4.2 x32 : 896.5 150 98 18 34 71 12 592.9 15 15.0
170 Zoe 0.1 x32 : 831.6 94 46 27 21 63 29 639.9 24 23.8
171 Pyotr Amateur Edition v0.6 x32 : 782.1 94 42 27 25 59 29 642.0 24 23.8
172 Leela Chess Zero Gen 8 x64 : 775.4 46 23 7 16 58 15 647.4 23 23.0
173 NSVChess 0.14 x32 : 765.2 244 115 69 60 61 28 617.5 30 25.6
174 Dikabi v0.4209 x32 : 732.6 94 22 59 13 55 63 644.2 24 23.8
175 Easy Peasy 1.0 x32 : 667.0 244 117 30 97 54 12 623.1 30 25.6
176 Pyotr Novice Edition v2.6 x32 : 650.4 94 35 20 39 48 21 647.6 24 23.8
177 Leela Chess Zero Gen 6 x64 : 591.6 92 31 18 43 43 20 647.4 23 23.0
178 N.E.G. 1.2 x32 : 511.6 244 89 29 126 42 12 632.0 30 25.6
179 Acqua ver. 20160918 x32 : 506.2 244 94 17 133 42 7 632.3 30 25.6
180 Ram 2.0 x32 : 388.7 244 58 46 140 33 19 639.1 30 25.6
181 Leela Chess Zero Gen 4 x64 : 369.1 150 43 18 89 35 12 628.0 15 15.0
182 CPP1 0.1038 x32 : 323.9 244 45 49 150 28 20 642.8 30 25.6
183 LaMoSca v0.10 x32 : 253.3 244 2 111 131 24 45 646.9 30 25.6
184 POS v1.20 x32 : 144.0 244 18 45 181 17 18 653.1 30 25.6
185 EtherealRandom (8.97) x64 : 52.5 94 2 15 77 10 16 673.1 24 23.8
186 EtherTrueRand 9.21 x64 : 34.8 244 2 48 194 11 20 659.4 30 25.6
187 Teki Random Mover x64 : 0.0 244 0 44 200 9 18 661.4 30 25.6
At the end of round 3, Gen 8 already has 39.5 points, only 0.5 less than Gen 6 got with 1 more full round played. That's the kind of improvement we got :D
Last round starting now, will post results in a while.
Here's the finished gauntlet:
-----------------Leela Chess Zero Gen 8 x64-----------------
Leela Chess Zero Gen 8 x64 - Acqua ver. 20160918 x32 : 3,0/4 3-1-0 (1101) 75% +191
Leela Chess Zero Gen 8 x64 - BRAMA 05/12/2004 x32 : 0,5/4 0-3-1 (000=) 13% -330
Leela Chess Zero Gen 8 x64 - CPP1 0.1038 x32 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Dikabi v0.4209 x32 : 1,5/4 1-2-1 (0=10) 38% -85
Leela Chess Zero Gen 8 x64 - Easy Peasy 1.0 x32 : 3,0/4 3-1-0 (1110) 75% +191
Leela Chess Zero Gen 8 x64 - EtherealRandom (8.97) x64 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - EtherTrueRand 9.21 x64 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Hanzo the Razor x32 : 1,0/4 0-2-2 (=00=) 25% -191
Leela Chess Zero Gen 8 x64 - Iota 1.0 x32 : 0,5/4 0-3-1 (00=0) 13% -330
Leela Chess Zero Gen 8 x64 - LaMoSca v0.10 x32 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - MFChess 1.3 x32 : 1,0/4 0-2-2 (0=0=) 25% -191
Leela Chess Zero Gen 8 x64 - N.E.G. 1.2 x32 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - NSVChess 0.14 x32 : 1,5/4 1-2-1 (01=0) 38% -85
Leela Chess Zero Gen 8 x64 - POS v1.20 x32 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Pyotr Amateur Edition v0.6 x32 : 2,0/4 0-0-4 (====) 50% ±0
Leela Chess Zero Gen 8 x64 - Pyotr Novice Edition v2.6 x32 : 3,5/4 3-0-1 (11=1) 88% +346
Leela Chess Zero Gen 8 x64 - Ram 2.0 x32 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Talvmenni 0.1 x32 : 0,0/4 0-4-0 (0000) 0% -1200
Leela Chess Zero Gen 8 x64 - Teki Random Mover x64 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 8 x64 - Usurpator II x32 : 0,5/4 0-3-1 (00=0) 13% -330
Leela Chess Zero Gen 8 x64 - Xadreco 5.83 x32 : 0,0/4 0-4-0 (0000) 0% -1200
Leela Chess Zero Gen 8 x64 - Youk V1.05 x32 : 1,5/4 1-2-1 (=001) 38% -85
Leela Chess Zero Gen 8 x64 - Zoe 0.1 x32 : 2,0/4 1-1-2 (0==1) 50% ±0
And just for comparison, here's Gen 6
-----------------Leela Chess Zero Gen 6 x64-----------------
Leela Chess Zero Gen 6 x64 - Acqua ver. 20160918 x32 : 2,0/4 2-2-0 (0011) 50% ±0
Leela Chess Zero Gen 6 x64 - BRAMA 05/12/2004 x32 : 0,0/4 0-4-0 (0000) 0% -1200
Leela Chess Zero Gen 6 x64 - CPP1 0.1038 x32 : 3,0/4 3-1-0 (1110) 75% +191
Leela Chess Zero Gen 6 x64 - Dikabi v0.4209 x32 : 2,0/4 0-0-4 (====) 50% ±0
Leela Chess Zero Gen 6 x64 - Easy Peasy 1.0 x32 : 1,0/4 1-3-0 (0010) 25% -191
Leela Chess Zero Gen 6 x64 - EtherealRandom (8.97) x64 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - EtherTrueRand 9.21 x64 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - Hanzo the Razor x32 : 1,0/4 0-2-2 (0==0) 25% -191
Leela Chess Zero Gen 6 x64 - Iota 1.0 x32 : 0,5/4 0-3-1 (00=0) 13% -330
Leela Chess Zero Gen 6 x64 - LaMoSca v0.10 x32 : 3,0/4 2-0-2 (==11) 75% +191
Leela Chess Zero Gen 6 x64 - MFChess 1.3 x32 : 0,5/4 0-3-1 (0=00) 13% -330
Leela Chess Zero Gen 6 x64 - N.E.G. 1.2 x32 : 2,0/4 1-1-2 (10==) 50% ±0
Leela Chess Zero Gen 6 x64 - NSVChess 0.14 x32 : 2,0/4 1-1-2 (0==1) 50% ±0
Leela Chess Zero Gen 6 x64 - POS v1.20 x32 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - Pyotr Amateur Edition v0.6 x32 : 0,5/4 0-3-1 (0=00) 13% -330
Leela Chess Zero Gen 6 x64 - Pyotr Novice Edition v2.6 x32 : 2,0/4 1-1-2 (1==0) 50% ±0
Leela Chess Zero Gen 6 x64 - Ram 2.0 x32 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - Talvmenni 0.1 x32 : 0,0/4 0-4-0 (0000) 0% -1200
Leela Chess Zero Gen 6 x64 - Teki Random Mover x64 : 4,0/4 4-0-0 (1111) 100% +1200
Leela Chess Zero Gen 6 x64 - Usurpator II x32 : 0,0/4 0-4-0 (0000) 0% -1200
Leela Chess Zero Gen 6 x64 - Xadreco 5.83 x32 : 0,0/4 0-4-0 (0000) 0% -1200
Leela Chess Zero Gen 6 x64 - Youk V1.05 x32 : 0,0/4 0-4-0 (0000) 0% -1200
Leela Chess Zero Gen 6 x64 - Zoe 0.1 x32 : 0,5/4 0-3-1 (0=00) 13% -330
The improvement is pretty evident. Here's the rating list as of now:
163 Usurpator II x32 : 1038.3 107 74 12 21 75 11 692.7 34 30.1
164 Safrad 2.1.35.210 x32 : 999.9 208 112 23 73 59 11 818.4 35 27.9
165 Hanzo the Razor x32 : 985.9 96 52 42 2 76 44 636.7 24 24.0
166 MFChess 1.3 x32 : 934.2 96 54 30 12 72 31 638.8 24 24.0
167 Youk V1.05 x32 : 922.0 154 63 22 69 48 14 932.5 46 44.4
168 StrategicDeep 1.25 x32 : 908.1 58 4 3 51 9 5 1424.0 23 22.5
169 Hippocampe v0.4.2 x32 : 898.5 150 98 18 34 71 12 594.0 15 15.0
170 Zoe 0.1 x32 : 824.3 96 46 28 22 63 29 643.4 24 24.0
171 Pyotr Amateur Edition v0.6 x32 : 782.6 96 42 29 25 59 30 645.2 24 24.0
172 Leela Chess Zero Gen 8 x64 : 782.4 92 45 17 30 58 18 647.5 23 23.0
173 NSVChess 0.14 x32 : 769.6 246 116 70 60 61 28 619.4 30 25.9
174 Dikabi v0.4209 x32 : 734.8 96 23 59 14 55 61 647.2 24 24.0
175 Easy Peasy 1.0 x32 : 669.9 246 118 30 98 54 12 625.1 30 25.9
176 Pyotr Novice Edition v2.6 x32 : 649.2 96 35 21 40 47 22 650.7 24 24.0
177 Leela Chess Zero Gen 6 x64 : 592.0 92 31 18 43 43 20 647.5 23 23.0
178 N.E.G. 1.2 x32 : 510.7 246 89 29 128 42 12 634.1 30 25.9
179 Acqua ver. 20160918 x32 : 510.7 246 95 17 134 42 7 634.1 30 25.9
180 Ram 2.0 x32 : 388.4 246 58 46 142 33 19 641.1 30 25.9
181 Leela Chess Zero Gen 4 x64 : 369.9 150 43 18 89 35 12 629.2 15 15.0
182 CPP1 0.1038 x32 : 323.7 246 45 49 152 28 20 644.8 30 25.9
183 LaMoSca v0.10 x32 : 253.2 246 2 111 133 23 45 648.8 30 25.9
184 POS v1.20 x32 : 144.0 246 18 45 183 16 18 655.0 30 25.9
185 EtherealRandom (8.97) x64 : 51.9 96 2 15 79 10 16 675.6 24 24.0
186 EtherTrueRand 9.21 x64 : 34.8 246 2 48 196 11 20 661.2 30 25.9
187 Teki Random Mover x64 : 0.0 246 0 44 202 9 18 663.2 30 25.9
+190 from Gen 6 to Gen 8, i think it's pretty good.
Just finished the next match against Stockfish Level 0:
Score of lc_gen9 vs sf_lv0: 42 - 57 - 1 [0.425] 100
Elo difference: -52.51 +/- 69.39
This is an improvement of 59 Elo compared to gen8 (https://github.com/glinscott/leela-chess/issues/100#issuecomment-373554840) and 148 Elo compared to gen7 (https://github.com/glinscott/leela-chess/issues/100#issuecomment-372963434), using Stockfish Level 0 as a metric. So there is a steady improvement, just at a rate less than the self-play Elo which is to be expected.
Interestingly for me Leela Gen 9 has no problem beating level 0 stockfish, what settings are you using for that?
800 playouts, and Dirichlet noise. I know it can beat SF Level 0 with more playouts, but I want to keep the metric constant.
I think the tests we do should be without noise. Noise is good for self-training, cause it may lead to a new, better move that it will learn from, but for tournaments and elo testing, we should disable noise imho, we want the strongest version of the engine playing those games.
Yes but the ELO shown in the main page of http://lczero.org/ must use thoses 800 rollouts.. So if we want to compare this graph with "real" ELO we need it to be in the same conditions.
@CMCanavessi The problem with not using noise currently is determinism. Until LCZero has random symmetries applied for every neural net evaluation, it will currently play deterministically if you use neither Dirichlet noise nor temperature=1, i.e. proportional move selection. On some systems OpenCL errors remove the deterministic behaviour, but on mine it doesn't since I use CPU. You can easily test that Dirichlet noise affects playing strength only in a very minor way (it might not even do so at all yet since policy priors are still weak), while temperature=1 vastly lowers playing strength.
So until we have a better way to ensure variance in games played (I also opened https://github.com/glinscott/leela-chess/issues/67 for this purpose), keeping Dirichlet noise on always is our best bet.
If you want to see for yourself, just do a cutechess-cli match of two LCZero nets against each other, without OpenCL. They will repeat the same two games over and over.
I ran some noise testing earlier today, and it doesn't seem to affect strength too much. (1k playouts)
Score of LeelaChess gen9 1k vs LeelaChess gen9 1k noise: 46 - 36 - 18 [0.555]
Elo difference: 34.86 +/- 62.50
100 of 100 games finished.
I had tested the engine with noise against itself with no noise for a very early net, and found also no effect then, but I repeated the experiment on your results. Mine look very similar:
Score of lc_gen9 vs lc_gen9n: 49 - 38 - 13 [0.555] 100
Elo difference: 38.37 +/- 64.51
It's a pity there's no reliable way to enforce variation without weakening the engine... I could probably get away with not using noise against Stockfish, but any match between lczero with different nets would still require it. Maybe once symmetries are implemented, we can retire using Dirichlet noise for evaluation matches.
When matching the new net against Stockfish (Lv 0), I didn't find a regression but a very slight improvement compared to gen9:
Score of lc_gen10 vs sf_lv0: 44 - 56 - 0 [0.440] 100
Elo difference: -41.89 +/- 69.44
@Error323 What was the actual match result of gen10 vs gen9?
Score of lc_gen10 vs lc_gen9: 43 - 53 - 4 [0.450] 100
Elo difference: -34.86 +/- 67.82
Finished match
Only difference is V2 samples are in the mix. And they have been verified thoroughly, BUT the movecount only goes up to 255 as it's now an unsigned int8.
I'm about to start the usual gauntlet that I run vs 23 other engines. Will inform results later.
Either way, if the next net is trained on gen8, gen9, and gen10 games, it would have a large sample of very similar strength training data which should allow it to generalise successfully.
So with V2, any games above 255 ply are adjudicated as draw? Or do they simply keep a move count of 255 at every ply beyond that?
They keep the same move count. I think the net should not use it as input really... We have 8 history planes for 3fold and a 50 move counter input for the 50 move rule.
It's only producing noise now and could be the reason for the drop in strength with self. Maybe we should set it to always 0?
I don't see a good reason why not... 3-fold and 50 move counter should be enough. The only possible use I can think of for feeding move count to the net as input is to recognise when games are truncated, but at 450 ply, that happens way too rarely for these adjudicated draws to have any effect on training.
Can anyone else here think of a good reason why the training data needs to include move count?
Ok so I didn't test Gen 9, so I'm comparing to Gen 8 but from what I'm seeing right now, Gen 10 is a definite (can't say "big" yet) improvement. It's already getting draws and wins vs engines that it never managed to before. We'll see what the raw numbers say in a while.
Round 1 of 4 completed:
Gen 8 got 12.5 points out of 23 / 11-3-9 WDL Gen 10 got 14 points out of 23 (with 2 wins vs engines that hadn't beaten before) / 13-2-8 WDL
Calculated rating so far:
189 Leela Chess Zero Gen 10 x64 : 789.5 23 13 2 8 61 9 652.0 23 23.0
190 Leela Chess Zero Gen 8 x64 : 787.6 92 45 17 30 58 18 652.0 23 23.0
Finally, f393628a becomes the first net to beat Stockfish Level 0 with 800 playouts and noise, and it does so by a significant margin:
Score of lc_f393628a vs sf_lv0: 71 - 28 - 1 [0.715] 100
Elo difference: 159.78 +/- 76.76
The Elo difference to the match with gen9 (5c8d14d5) is actually larger than what the direct match by @Error323 yielded. I think there is a good chance that this net would also do very well in @CMCanavessi's tournament, and it looks like at least tentative evidence that a 200k chunk window works well.
I am planning to continue these matches with Stockfish Level 0 until LCZero manages a 85%-90% winrate, and then switch to Level 5 as reference. I think Level 5 will be a good choice since I earlier tested (https://github.com/glinscott/leela-chess/issues/109#issuecomment-372701765) that the supervised net kbb1-64x6-796000.txt is roughly comparable to SF Level 5-6.
Also, we should have clarity in how we refer to networks: Do we continue to call them genxx by the order in which they were promoted to best network, or by their hash? The latter would allow easier reference to candidate nets that were never promoted, but genxx is more intuitive in a way.
@jkiliani Let us switch to hashes. The problem is that for newcomers the generation is written nowhere on the website, which makes it confusing.
Excellent! And indeed the 200K window is now the standard! I also trained a new version last night afterwards with a 100K window, but the MSE on the testset was much higher, indicating overfitting. So nice work @jkiliani :+1:
It's interesting how it suddenly happened to overfit so badly. Something to think about...
about the networks: I'm now calling them by their sha256sum.
So, the Leela Zero tradition it is. But could you truncate the hash to 8 characters for the purpose of naming the files directly downloaded from http://lczero.org/networks? This should suffice for uniqueness, and anything longer than 8 chars becomes really cumbersome. I think even 6 chars would probably suffice to not confuse networks...
Personally I'd also like 6 chars. It's memorizable and probably sufficient. I'll discuss with @glinscott
About the training window, Leela Zero went the opposite direction: In the beginning, we used 500k games since that was the value from the AlphaZero paper, even though a much smaller window would almost certainly have been better in the beginning. Later @gcp reduced it to 250k games when it was becoming obvious that the large window obstructed progress in the beginning. Only now, with a very strong and large network, and slow progress, is enlarging the window being discussed again.
I think the 500k from Deepmind must have been picked mainly for the late training phase.
The bottom elo on ccrl 40/4 is only 276 elo and sometimes loses to a random mover, but it requires java. A good first target might be chessputer open source UCI cpp at 765 elo.
I don't know an elo, but alan turing's historic chess program has been implemented in chessbase engine UCI (download), and played against Kasprov (he beat in 16 moves). Would be good publicity, and it can be set to different ply depths.
Robocide, open source C UCI engine, 1897 elo
Ruffian 2.1.0 rated 2609. Was the best free engine I used to use a long time ago.
Crafty, famous, elo 2400-3000 depending on version.
Scorpio 2.7.9 was the weakest engine in the bottom TCEC 4th league around 2900 elo.
Gull 3, a strong open source program, now mid-level TCEC 1st league around 3200 elo.
Andsacs .93 open source mid-level TCEC Premier league 3300 elo with 4 CPU.
Komodo 9, winner of TCEC 8, now free, 3383 4 CPU.
Stockfish 9 top released engine, open source, 3560 elo with 4 CPU.