glinscott / leela-chess

**MOVED TO https://github.com/LeelaChessZero/leela-chess ** A chess adaption of GCP's Leela Zero
http://lczero.org
GNU General Public License v3.0
758 stars 298 forks source link

Different elo targets #109

Closed jjoshua2 closed 6 years ago

jjoshua2 commented 6 years ago

The bottom elo on ccrl 40/4 is only 276 elo and sometimes loses to a random mover, but it requires java. A good first target might be chessputer open source UCI cpp at 765 elo.

I don't know an elo, but alan turing's historic chess program has been implemented in chessbase engine UCI (download), and played against Kasprov (he beat in 16 moves). Would be good publicity, and it can be set to different ply depths.

Robocide, open source C UCI engine, 1897 elo

Ruffian 2.1.0 rated 2609. Was the best free engine I used to use a long time ago.

Crafty, famous, elo 2400-3000 depending on version.

Scorpio 2.7.9 was the weakest engine in the bottom TCEC 4th league around 2900 elo.

Gull 3, a strong open source program, now mid-level TCEC 1st league around 3200 elo.

Andsacs .93 open source mid-level TCEC Premier league 3300 elo with 4 CPU.

Komodo 9, winner of TCEC 8, now free, 3383 4 CPU.

Stockfish 9 top released engine, open source, 3560 elo with 4 CPU.

sf-x commented 6 years ago

probably sufficient

You can expect a duplicate 6-hex-digits name after a few thousand entries.

sf-x commented 6 years ago

It's interesting how it suddenly happened to overfit so badly. Something to think about...

Maybe not enough variety,76747 of 100000 games (am I right?) are from the same crap parameter set with preference for 1.h4

Error323 commented 6 years ago

Maybe not enough variety, 76747 of 100000 games (am I right?) are from the same crap parameter set with preference for 1.h4

No there's a huge gap between when I start training and when I upload a new version. At the time I started training there were only about 20000 games in that net. (Also because I did various other tests in between).

jkiliani commented 6 years ago

You can expect a duplicate 6-hex-digits name after a few thousand entries.

I just made a spreadsheet to test this, since there are 2^24 different 6-hex hashes. At 2000 entries, there's a roughly 11.5% chance of a hash collision, which seems perfectly acceptable to me given that we will likely only have a couple hundred networks. By the way we should consider gating at 55% winrate, instead of just 50%, for the matches.

mooskagh commented 6 years ago

There are plenty of online calculators, too: http://everydayinternetstuff.com/2015/04/hash-collision-probability-calculator/

50% chance of collision with 5000 24-bit hashes

Uriopass commented 6 years ago

Or even sprt like Leela Zero which makes more sense mathematicaly?

sf-x commented 6 years ago

1)like stockfish; 2)practically, not mathematically ;)

jkiliani commented 6 years ago

Good to know, thanks 😄. Writing the spreadsheet wasn't hard though.

For promotions, SPRT also seems very sensible to me, but I think we should only terminate early for failed nets, not for passed ones, since a few more matches can calibrate the progress curve better.

@Error323 If the match system is already online now (including automatically updating the progress chart?), could we test it by improving the statistics of a few older matches, to make the progress curve more accurate?

CMCanavessi commented 6 years ago

Why don't we replace the "ID" column in http://lczero.org/networks with a "Generation" column? It's much more intuitive, and I've seen users already asking for that. I think it's cooler also. If you are talking about different networks and say "Gen 25 vs Gen 17" you instantly know which ones is more recent, but if you say "b91f353d vs 6d2eaec0" that tells you nothing without looking at the table.

Uriopass commented 6 years ago

Or just directly use the ID ? I mean it's only 4-off

sf-x commented 6 years ago

I think we should only terminate early for failed nets, not for passed ones, since a few more matches can calibrate the progress curve better.

Isn't the testing a bottleneck which consumes valuable resources? Someone else (tm) can run matches to establish exact rate of progress; what matters for this project is "is it sufficiently better"? It's also possible to extract some info from SPRT results; it wildly inaccurate but better than nothing.

@vdbergh once proposed to use a even more efficient test than SPRT but then claimed that (from memory) that 10% efficiency gain is not worth sacrificing the simplicity of SPRT. Possibly he just got tired from wrestling with blockheaded maintainer of Stockfish

Error323 commented 6 years ago

Why don't we replace the "ID" column in http://lczero.org/networks with a "Generation" column? It's much more intuitive, and I've seen users already asking for that. I think it's cooler also. If you are talking about different networks and say "Gen 25 vs Gen 17" you instantly know which ones is more recent, but if you say "b91f353d vs 6d2eaec0" that tells you nothing without looking at the table.

True, I just figured that using the hash would be selfsufficient. I.e. can be recomputed from the actual weights. But using the ID is more intuitive, however it does heavily depend on the server's database.

killerducky commented 6 years ago

How about: use the first 6 hash characters until the first hash collision. On the first hash collision throw a celebration party, and start using the first 8 characters. Repeat as necessary.

:)

sf-x commented 6 years ago

+1 ^ People are using as low as 4 digits now

isty2e commented 6 years ago

How about a more easy-to-read name using petname as in minigo? If there is a duplicate, you can just run it once again.

jkiliani commented 6 years ago

The match of 6690eb against Stockfish Level 0 just finished:

Score of lc_6690eb vs sf_lv0: 79 - 20 - 1  [0.795] 100
Elo difference: 235.45 +/- 87.18

Another solid improvement, roughly correlates to the self-play match. Next net will still be against SF Lv0 first, until I have a result >85% winrate.

CMCanavessi commented 6 years ago

I'm starting the usual gauntlet vs 23 opponents in a couple of minutes. I think this will be the last gauntlet with these 23 opponents, as they are already too weak but that's better, I can select some new ones, ranging from 1000 to around 1700 elo, with more familiar names, like TSCP. So depending on how this one goes, next one might be a new group.

jjoshua2 commented 6 years ago

There is hardly any point playing 4 games against a random mover now or lamosa, so you could easily leave out the bottom 5.

CMCanavessi commented 6 years ago

Yep, but there's also not much point in playing against 500-600 engines, they all end 4-0 anyways. I might also up the number of games vs each opponent to 8.

jjoshua2 commented 6 years ago

With more games there is some use, maybe one will draw or something... I'm excited to see some matches against TSCP and Robicide, and beat up Hippocampe

CMCanavessi commented 6 years ago

Well I just checked the progress and Leela is trouncing everyone. It's beating the hell out of all the 1050+ engines... completely amazing.

jjoshua2 commented 6 years ago

Saruman 2017.08.10 64-bit at 1590 is the lowest rated engine with 2017 or 2018 in the title... And it's open source too! Theres actually quite a few once you get to this level and above.

CMCanavessi commented 6 years ago

Beautiful, Xadreco is 1050 in my rating list.

[Event "020 - LCZero Gen 12 Gauntlet"]
[Site "RYZEN"]
[Date "2018.03.20"]
[Round "1"]
[White "Leela Chess Zero Gen 12 x64"]
[Black "Xadreco 5.83 x32"]
[Result "1-0"]
[ECO "D34"]
[Opening "QGD Tarrasch"]
[Time "16:27:38"]
[Variation "7.Bg2 Be7 8.O-O"]
[TimeControl "60+1"]
[Termination "normal"]
[PlyCount "135"]
[WhiteType "program"]
[BlackType "program"]

1. c4 e6 2. Nf3 c5 3. g3 d5 4. cxd5 exd5 5. d4 Nc6 6. Bg2 Nf6 7. O-O Be7 8.
Nc3 c4 9. Ne5 {(9.Ne5 Nxe5 10.dxe5 Ng4 11.Nxd5 Nxe5 12.Nxe7 Qxe7 13.f4 Ng4)
-0.22/19 2} O-O {(9. ... Nxe5 10.dxe5 Ng4 11.Bf4 0-0) +0.30/4 34} 10. Bf4
{(10.Bf4 Nh5 11.Nxd5 Nxf4 12.Nxf4 Nxe5 13.dxe5) -0.04/19 2} Nb4 {(10. ...
Be6) +1.00/4 1} 11. a3 {(11.a3 Nc6 12.h4 Nxe5 13.dxe5 Ng4 14.Nxd5 Qxd5)
+0.72/19 2} Nh5 {(11. ... Nc2) +1.20/4 1} 12. axb4 {(12.axb4 g5 13.e3 Nxf4
14.gxf4 gxf4 15.exf4 Bxb4 16.f5 Bxc3 17.bxc3) +2.37/19 2} Nxf4 {(12. ...
Nxf4) +1.05/5 1} 13. gxf4 {(13.gxf4 f6 14.Bxd5+ Kh8 15.Nxc4 Bxb4 16.Qb3
Bxc3 17.Qxc3) +3.31/19 2} Be6 {(13. ... f6) -0.58/5 1} 14. f5 {(14.f5 Bxf5
15.Nxd5 g6 16.e4 Bxe4 17.Bxe4 f5 18.Bf3) +3.15/19 2} Bxf5 {(14. ... Bxf5)
-0.46/5 1} 15. Nxd5 {(15.Nxd5 g6 16.e4 Be6 17.Nxe7+ Qxe7 18.Nxc4 Bxc4
19.f3) +3.31/20 2} Be6 {(15. ... Bxb4 16.Nxb4 Qg5 17.Ng4 Bxg4) -0.92/4 1}
16. Nf4 {(16.Nf4 g5 17.Nxe6 fxe6 18.Bxb7 Rb8 19.Nc6 Rxb7 20.Nxd8 Rxd8 21.e4
Bxb4 22.h3 c3 23.bxc3) +3.10/19 2} Qb6 {(16. ... Qb6 17.Nxe6 Qxe6 18.Bxb7
Qxe5) -1.31/4 1} 17. Nxe6 {(17.Nxe6 fxe6 18.Nxc4 Qxb4 19.Ra4 Qxc4 20.Rxc4
Rad8 21.d5 exd5) +3.89/19 2} Qxe6 {(17. ... Qxe6) -1.53/5 1} 18. Bxb7
{(18.Bxb7 Rab8 19.Ba6 Bxb4 20.Bxc4 Qxc4 21.Nxc4 h6 22.Ne5 f6) +3.78/19 2}
Rad8 {(18. ... Rad8 19.Qc2 Rxd4 20.Nf3 Qxe2) -0.62/4 1} 19. Rxa7 {(19.Rxa7
Qb6 20.Qa4 Rb8 21.Bd5 c3 22.bxc3 Kh8 23.Nxf7+ Kg8) +5.18/19 2} Bxb4 {(19.
... Bxb4 20.Ra4 Qxe5 21.Rxb4 Qxe2) -0.69/4 1} 20. Nc6 {(20.Nc6 f6 21.Nxd8
Rxd8 22.Qa4 c3 23.bxc3 Bxc3 24.d5 Qxd5) +5.54/19 2} Qg6+ {(20. ... Rxd4)
-2.94/5 1} 21. Kh1 {(21.Kh1 Rd7 22.Qa4 Bd6 23.Ne5 Bxe5 24.Qxd7 Bf6 25.Ra8
Rxa8 26.Bxa8 Bd8 27.Qxd8+) +4.77/19 2} Bc5 {(21. ... Qg2+) -3.25/5 1} 22.
Nxd8 {(22.Nxd8 Bxa7 23.Nc6 c3 24.bxc3 Bb6 25.Ne5 Qh5 26.Bc6) +5.26/19 2}
Bxa7 {(22. ... Bxa7) -3.06/5 1} 23. Nc6 {(23.Nc6 c3 24.bxc3 Rb8 25.Nxb8
Bxb8 26.Ba8 h6 27.Qd2 Kh7 28.f3) +5.11/19 2} Re8 {(23. ... Qg2+) -1.96/5 1}
24. Nxa7 {(24.Nxa7 Qb6 25.Qa4 Qxb7+ 26.Qc6 Qxc6+ 27.Nxc6 Rxe2 28.Ne7+ Kf8
29.Nc6 Rxb2 30.d5) +4.39/19 2} Re7 {(24. ... Qd3) -5.47/5 1} 25. Bf3
{(25.Bf3 Rxa7 26.h4 h6 27.h5 Qf6 28.Be4 Qe7 29.Bc6) +3.68/19 2} Rxa7 {(25.
... Rxa7 26.Rg1) -4.52/5 1} 26. Qc1 {(26.Qc1 Qa6 27.Rd1 f6 28.d5 c3)
+2.87/19 2} Re7 {(26. ... Ra1) -2.92/5 1} 27. Qxc4 {(27.Qxc4 Re6 28.Qc5 h5
29.h4 Rf6 30.d5) +4.73/19 2} Re8 {(27. ... Qc2) -3.35/5 1} 28. h4 {(28.h4
Kf8 29.h5 Qf6 30.Bg4 Qg5 31.Bf3 Qf6 32.Bg4 Rd8) +5.56/19 2} Qf6 {(28. ...
Qf5 29.Bg2 Qg4 30.Bd5 Rxe2) -2.23/4 1} 29. Kg2 {(29.Kg2 Qd6 30.Rd1 Rb8
31.b3 Kf8 32.Rd3) +6.91/19 2} Rb8 {(29. ... Rb8 30.e3 Re8 31.Bd5 Rxe3)
-2.83/4 1} 30. b4 {(30.b4 Qxh4 31.b5 Qf6 32.Bc6 Kf8 33.d5) +7.10/19 2} Qxh4
{(30. ... Rd8 31.e3 Re8 32.Bd5 Rxe3) -2.93/4 1} 31. Qc5 {(31.Qc5 Qd8 32.Ra1
f6 33.Ra7 g6) +7.31/19 2} Rd8 {(31. ... Qh2+) -4.23/5 1} 32. b5 {(32.b5 Qf6
33.b6 Rb8 34.Qc7 Rxb6 35.Qxb6) +8.04/19 2} Qxd4 {(32. ... Qh2+) -3.83/5 1}
33. Qxd4 {(33.Qxd4 Rxd4 34.Ra1 Rd8 35.Ra8 Rxa8 36.Bxa8 Kh8 37.b6 f5 38.b7
Kg8 39.b8Q+) +9.87/18 2} Rxd4 {(33. ... Rxd4 34.b6 Rd6 35.b7 Rd8 36.Rc1
Rd2) -3.37/6 1} 34. Ra1 {(34.Ra1 g6 35.b6 Rb4 36.b7 Rb2 37.Ra7 Rb1 38.b8Q+
Kg7 39.Qxb1 Kf6 40.Qb8) +10.27/19 2} f5 {(34. ... Rd2) -2.37/5 1} 35. Ra8+
{(35.Ra8+ Kf7 36.b6 Rb4 37.b7 Rxb7 38.Bxb7 g6 39.Ra7 Kf6 40.Kf3) +11.09/18
2} Kf7 {(35. ... Rd8) -4.07/6 1} 36. b6 {(36.b6 Rb4 37.b7 Rxb7 38.Bxb7 g6
39.Ra7 Kf6 40.Kf3) +12.07/19 2} Rd6 {(36. ... Rd2 37.Rf8+) -3.77/4 1} 37.
b7 {(37.b7 Rb6 38.b8Q Rxb8 39.Rxb8 Ke7 40.Ra8 Kd7 41.Ra3 Kd6) +12.83/18 2}
Rg6+ {(37. ... Rd2) -3.57/5 1} 38. Kf1 {(38.Kf1 Rb6 39.b8Q Rxb8 40.Rxb8 Ke7
41.Ra8 Kf6 42.Kg2) +13.98/18 2} Rb6 {(38. ... Rg1+) -3.77/5 1} 39. b8=Q
{(39.b8Q Rxb8 40.Rxb8 Ke7 41.Ra8 Kd7 42.Ra3 Kd6) +12.47/18 2} Rxb8 {(39.
... Rxb8 40.Rxb8 Kf6) -8.56/7 1} 40. Rxb8 {(40.Rxb8 g6 41.Ra8 Kf6 42.Bd5
Ke5) +12.28/18 2} Ke6 {(40. ... Kf6) -8.56/6 1} 41. Ra8 {(41.Ra8 Ke5 42.Kg2
Kf4 43.Bd5 Ke5) +12.22/18 2} g5 {(41. ... Ke5) -8.06/6 1} 42. Bb7 {(42.Bb7
Ke5 43.Kg2 Kf4 44.Bc6 g4) +11.78/17 2} Kd6 {(42. ... g4) -7.86/5 1} 43. Kg2
{(43.Kg2 Kc7 44.Ba6 h5 45.Kf3) +12.04/17 2} Kc7 {(43. ... Kc5 44.Rc8+)
-9.06/5 1} 44. Ba6 {(44.Ba6 Kb6 45.Kf3 h6 46.Bc4 Kb7 47.Rh8) +11.33/17 1}
h5 {(44. ... Kd6) -7.46/6 1} 45. Kf3 {(45.Kf3 h4 46.Bb5 h3 47.Kg3 h2
48.Kxh2) +12.37/17 1} Kd6 {(45. ... g4+) -7.46/6 1} 46. Kg3 {(46.Kg3 Ke5
47.Bb5 h4+ 48.Kf3) +12.80/17 1} h4+ {(46. ... Kd5) -7.26/6 1} 47. Kh3
{(47.Kh3 Kc7 48.Bb5 Kb6 49.Bd7 Kc7) +12.88/17 1} Ke5 {(47. ... g4+) -6.26/6
1} 48. Bb7 {(48.Bb7 Kf4 49.Bg2 g4+ 50.Kxh4 g3 51.fxg3+ Ke3 52.Kg5 Kxe2)
+12.54/16 1} Ke6 {(48. ... g4+) -6.66/6 1} 49. Bg2 {(49.Bg2 Ke5 50.f3 Kf4
51.Kh2 g4) +12.83/16 1} Ke5 {(49. ... g4+) -7.06/6 1} 50. f3 {(50.f3 Kf4
51.Kh2 Ke3 52.Kh3 Kxe2) +12.42/16 1} Ke6 {(50. ... g4+) -6.76/6 1} 51. Bh1
{(51.Bh1 Ke5 52.Bg2 Kf4 53.Ra3 Ke5) +12.51/16 1} Ke5 {(51. ... g4+) -6.76/6
1} 52. Bg2 {(52.Bg2 Kf4 53.e4 Ke5 54.exf5 Kxf5 55.Ra4 Ke5) +12.32/16 1} Ke6
{(52. ... g4+) -6.76/6 1} 53. e4 {(53.e4 fxe4 54.fxe4 Ke5 55.Kg4 Kf6 56.Bh3
Ke7 57.Kxg5) +12.35/17 1} fxe4 {(53. ... Ke5) -7.26/7 1} 54. fxe4 {(54.fxe4
Ke5 55.Kg4 Kf6 56.Bh3 Ke5 57.Kxg5 Kxe4 58.Kxh4) +12.63/17 1} Ke7 {(54. ...
Ke5) -7.33/6 1} 55. Kg4 {(55.Kg4 Kf6 56.Bh3 Ke5 57.Kxg5 Kxe4 58.Kxh4 Kf4)
+12.85/17 1} Kf6 {(55. ... h3) -8.03/7 1} 56. Bh3 {(56.Bh3 Ke5 57.Kxg5 Kxe4
58.Kxh4 Kf4 59.Bf1 Ke3) +12.72/16 1} Ke5 {(56. ... Ke6) -7.73/7 1} 57. Ra4
{(57.Ra4 Ke6 58.Kxg5+ Ke5 59.Kxh4 Kf6 60.Kg4) +12.70/16 1} Kf6 {(57. ...
Ke6) -7.53/7 1} 58. Ra5 {(58.Ra5 Ke6 59.Kxg5+ Kd6 60.Kf4 Kc6 61.e5)
+13.15/16 1} Ke6 {(58. ... Ke7) -8.03/7 1} 59. Kxg5+ {(59.Kxg5+ Kd6 60.Kf4
Kc6 61.e5 Kb6 62.Ra8 Kc7) +13.74/16 1} Kd6 {(59. ... Ke7) -9.65/7 1} 60.
Kf4 {(60.Kf4 Kc6 61.e5 Kb6 62.Ra1 Kc5 63.e6) +14.37/16 1} Kc7 {(60. ...
Kc6) -9.55/6 1} 61. e5 {(61.e5 Kb6 62.Ra8 Kb7 63.Rh8 Kc6) +14.79/16 1} Kc6
{(61. ... Kb6) -9.95/6 1} 62. e6 {(62.e6 Kd6 63.Ra6+ Ke7 64.Kf5 Kf8)
+14.62/16 1} Kd6 {(62. ... Kb6) -10.05/6 1} 63. Ra6+ {(63.Ra6+ Ke7 64.Ke5
Ke8 65.Kf6 Kd8) +14.92/16 1} Ke7 {(63. ... Kd5) -9.95/6 1} 64. Ke5 {(64.Ke5
Kf8 65.Kf6 Ke8 66.Ra8+) +14.81/16 1} Kf8 {(64. ... Ke8 65.Ra8+) -12.35/6 1}
65. Kf6 {(65.Kf6 Kg8 66.e7 Kh8 67.e8Q+ Kh7 68.Qf7+ Kh8) +17.52/16 1} Kg8
{(65. ... Ke8) -10.75/7 1} 66. e7 {(66.e7 Kh8 67.e8Q+ Kh7 68.Qd7+ Kg8
69.Qg7+) +18.65/16 1} Kh7 {(66. ... Kh8) -11.25/7 1} 67. e8=Q {(67.e8Q Kh6
68.Qh8+) +26.79/16 1} Kh6 {(67. ... Kh6 68.Qh8+) -M30/6 1} 68. Qh8#
{(68.Qh8+) +36.38/16 1} 1-0
jjoshua2 commented 6 years ago

I think I would stop your tournament early after one or two rounds then, and get at least one engine in there that will really beat it, and some around its level.

jkiliani commented 6 years ago

I'd say go through with it if you have the compute. From my matches against Stockfish I'd estimate lc_6690eb around 1150, so the strongest engines on your current match may still pull of the occasional win or draw at least.

CMCanavessi commented 6 years ago

It finally lost 2 games, vs Usurpator II and Pyotr Amateur. I'll let the gauntlet finish, shouldn't take long at this time control. Then I'll start a new one with the new pack of engines that I'll use till they are all too weak again.

CMCanavessi commented 6 years ago

Ok so I aborted it after 2 round robins, it was too easy for Leela.

Here's the calculated rating:

172 MSCP 1.4 x32                           :  1221.2      60    1    4   55     5     7  1758.4    15    15.0
 173 BRAMA 05/12/2004 x32                   :  1200.7     166   95   47   24    71    28   883.4    40    39.1
 174 Tikov 0.6.3 Rev 2 x32                  :  1176.6      64   21   10   33    41    16  1250.0    16    16.0
 175 Frank 0.58 x32                         :  1121.7      64   11   21   32    34    33  1253.4    16    16.0
 176 Talvmenni 0.1 x32                      :  1096.1     106   70   30    6    80    28   681.0    27    26.8
 177 Iota 1.0 x32                           :  1086.8     166   80   44   42    61    27   888.9    40    39.1
 178 Usurpator II x32                       :  1073.1     166   88   24   54    60    14   889.6    40    39.1
 179 Leela Chess Zero Gen 12 x64            :  1072.4      46   34    5    7    79    11   656.4    23    23.0
 180 Xadreco 5.83 x32                       :  1060.8     194   86   23   85    50    12  1016.8    48    47.5
 181 Fimbulwinter v5.05 x32                 :  1005.9      60    8    9   43    21    15  1261.0    15    15.0
 182 Safrad 2.1.35.210 x32                  :  1004.2     242  115   28   99    53    12   898.5    35    30.9
 183 Hanzo the Razor x32                    :   994.0     102   53   46    3    75    45   662.5    26    25.8
 184 MFChess 1.3 x32                        :   958.2     102   58   30   14    72    29   663.9    26    25.8
 185 Youk V1.05 x32                         :   930.8     194   66   28  100    41    14  1022.1    48    47.5
 186 StrategicDeep 1.25 x32                 :   922.8      92    7    4   81    10     4  1418.4    23    23.0

Gen 10 is 862, Gen 8 is 793, Gen 6 is 598, Gen 4 is 369

Results of the shortened gauntlet:

-----------------Leela Chess Zero Gen 12 x64-----------------
Leela Chess Zero Gen 12 x64 - Acqua ver. 20160918 x32        : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - BRAMA 05/12/2004 x32           : 0,5/2 0-1-1 (=0)  25%  -191
Leela Chess Zero Gen 12 x64 - CPP1 0.1038 x32                : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - Dikabi v0.4209 x32             : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - Easy Peasy 1.0 x32             : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - EtherealRandom (8.97) x64      : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - EtherTrueRand 9.21 x64         : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - Hanzo the Razor x32            : 1,0/2 0-0-2 (==)  50%    ±0
Leela Chess Zero Gen 12 x64 - Iota 1.0 x32                   : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - LaMoSca v0.10 x32              : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - MFChess 1.3 x32                : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - N.E.G. 1.2 x32                 : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - NSVChess 0.14 x32              : 1,5/2 1-0-1 (1=)  75%  +191
Leela Chess Zero Gen 12 x64 - POS v1.20 x32                  : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - Pyotr Amateur Edition v0.6 x32 : 1,0/2 1-1-0 (01)  50%    ±0
Leela Chess Zero Gen 12 x64 - Pyotr Novice Edition v2.6 x32  : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - Ram 2.0 x32                    : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - Talvmenni 0.1 x32              : 0,5/2 0-1-1 (0=)  25%  -191
Leela Chess Zero Gen 12 x64 - Teki Random Mover x64          : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - Usurpator II x32               : 0,0/2 0-2-0 (00)   0% -1200
Leela Chess Zero Gen 12 x64 - Xadreco 5.83 x32               : 1,0/2 1-1-0 (10)  50%    ±0
Leela Chess Zero Gen 12 x64 - Youk V1.05 x32                 : 2,0/2 2-0-0 (11) 100% +1200
Leela Chess Zero Gen 12 x64 - Zoe 0.1 x32                    : 1,0/2 1-1-0 (10)  50%    ±0
CMCanavessi commented 6 years ago

Ok, here we go. The real deal, I've made a new gauntlet, 8 rounds vs every engine, 25 opponents in total (200 total games), ranging from 1005 elo (Fimbulwinter) to 1850 elo (Skiull).

You can follow the games live at my twitch channel: https://www.twitch.tv/ccls/

It's not gonna be easy for Gen 12 with this pack of rivals, but that's the idea, to see the progress in a couple of days/weeks.

CMCanavessi commented 6 years ago

Around the middle of the 3rd round, Leela has improved its elo to 1083 and has gotten a couple of nice wins vs 1350+ elo engines (Supra and Sabrina), among others. It's doing better than I had expected to be honest. I don't think this pack of engines will last long.

178 Leela Chess Zero Gen 12 x64 : 1083.6 114 45 10 59 44 9 1108.9 48 47.0

CMCanavessi commented 6 years ago

Well, the new gauntlet finished and it was quite harsh. Leela finished with 37/200, we'll see how much that improves in the upcoming generations.

-----------------Leela Chess Zero Gen 12 x64-----------------
Leela Chess Zero Gen 12 x64 - AdaChess v2.1 (GSEI) x32         : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 12 x64 - Ceibo v0.3.65 x64                : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 12 x64 - Dragontooth 0.2 Bahamut x64      : 1,5/8 1-6-1 (010000=0)  19%  -252
Leela Chess Zero Gen 12 x64 - Eden 0.0.13 x32                  : 1,0/8 1-7-0 (00000001)  13%  -330
Leela Chess Zero Gen 12 x64 - Enxadrista 1.0 x32               : 2,0/8 2-6-0 (00101000)  25%  -191
Leela Chess Zero Gen 12 x64 - Fimbulwinter v5.05 x32           : 5,0/8 5-3-0 (01011011)  63%   +92
Leela Chess Zero Gen 12 x64 - Frank 0.58 x32                   : 4,0/8 2-2-4 (=100=1==)  50%    ±0
Leela Chess Zero Gen 12 x64 - Joanna2002 1.06 x32              : 1,0/8 0-6-2 (00000==0)  13%  -330
Leela Chess Zero Gen 12 x64 - KillerQueen 2 beta 3 x32         : 4,0/8 3-3-2 (1010=01=)  50%    ±0
Leela Chess Zero Gen 12 x64 - LarsenVB 0.05 x32                : 1,0/8 1-7-0 (00100000)  13%  -330
Leela Chess Zero Gen 12 x64 - MSCP 1.4 x32                     : 1,0/8 1-7-0 (10000000)  13%  -330
Leela Chess Zero Gen 12 x64 - Nanook v0.17 x32                 : 2,0/8 0-4-4 (=0==0=00)  25%  -191
Leela Chess Zero Gen 12 x64 - Numpty Recharged x64             : 1,5/8 1-6-1 (0=000001)  19%  -252
Leela Chess Zero Gen 12 x64 - Pierre v1.7 x32                  : 0,5/8 0-7-1 (00000=00)   6%  -478
Leela Chess Zero Gen 12 x64 - Piranha 0.5 x32                  : 1,0/8 1-7-0 (00000001)  13%  -330
Leela Chess Zero Gen 12 x64 - Pulse 1.6.1 x64                  : 0,5/8 0-7-1 (000000=0)   6%  -478
Leela Chess Zero Gen 12 x64 - Pwned v1.3 x64                   : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 12 x64 - Sabrina 3.1.25 x64               : 2,5/8 2-5-1 (01000=10)  31%  -139
Leela Chess Zero Gen 12 x64 - Satana 2.4.20 x64                : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 12 x64 - Simon v1.2 x32                   : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 12 x64 - Skiull 0.3 x64                   : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 12 x64 - Supra 26.0 Pro x64               : 2,5/8 2-5-1 (=0100001)  31%  -139
Leela Chess Zero Gen 12 x64 - Tikov 0.6.3 Rev 2 x32            : 3,5/8 3-4-1 (000111=0)  44%   -42
Leela Chess Zero Gen 12 x64 - Toledo Nanochess Jan/11/2010 x32 : 2,5/8 2-5-1 (0110=000)  31%  -139
Leela Chess Zero Gen 12 x64 - TSCP 1.81 x32                    : 0,0/8 0-8-0 (00000000)   0% -1200

Elo now is almost 1100... a huge improvement over Gen 10.

186 Leela Chess Zero Gen 12 x64            :  1097.6     246   61   25  160    30    10  1298.7    48    39.8
198 Leela Chess Zero Gen 10 x64            :   862.1      92   53   11   28    64    12   656.1    23    23.0
201 Leela Chess Zero Gen 8 x64             :   793.3      92   45   17   30    58    18   656.1    23    23.0
206 Leela Chess Zero Gen 6 x64             :   598.5      92   31   18   43    43    20   656.1    23    23.0
210 Leela Chess Zero Gen 4 x64             :   369.6     150   43   18   89    35    12   623.6    15    15.0
jkiliani commented 6 years ago

Match of gen13 (cd1a1e) against Stockfish Level 0:

Score of lc_cd1a1e vs sf_lv0: 80 - 19 - 1  [0.805] 100
Elo difference: 246.30 +/- 89.11

Only a marginal improvement compared to gen12. I'm going to just give the match against Stockfish Level 5 a shot now just to see what happens. Not using FPU reduction (https://github.com/glinscott/leela-chess/issues/160) for this match although it would certainly give LCZero a big boost against Stockfish, but it's experimental and not agreed on yet by this community.

Results against Stockfish Level 5:

Score of lc_cd1a1e vs sf_lv5: 14 - 84 - 2  [0.150] 100
Elo difference: -301.33 +/- 99.69

Matching SF Lv0 against SF Lv5 directly gave a rating difference of 672 Elo, but the transferred results against LCZero imply a rating difference of ~550 Elo. I tend to believe this more than the Stockfish self-play rating. Since @CMCanavessi's tournament implies that SF Level 0 is ~900 Elo strong, Stockfish Level 5 should be something like 1450 Elo, to use as an anchor for the near future.

I'll do gen14 (209032) after all so I have the comparison with @CMCanavessi's Elo rating.

CMCanavessi commented 6 years ago

Gen 14 is already out BTW.

CMCanavessi commented 6 years ago

Just started the gauntlet that Gen 12 played against, 25 engines ranging from 1000 to 1850 elo. We'll see what's the real improvement of Gen 14. If it can get to around 50/200 I'll call it a huge success (Gen 12 got 37/200).

I'll update later.

gsobala commented 6 years ago

Gen 14 209032b7 just drew with TSCP at 40/5' Now that's real progress in such a short time (I adjudicated a draw rather than wait out the threefold rep.)

[Site "?"]
[Date "2018.03.21"]
[Round "1"]
[White "leela-wrapper"]
[Black "TSCP"]
[Result "1/2-1/2"]
[ECO "B00"]
[GameEndTime "2018-03-21T22:20:54.402 GMT"]
[GameStartTime "2018-03-21T22:08:34.614 GMT"]
[Opening "King's pawn Opening"]
[PlyCount "98"]
[Termination "adjudication"]
[TimeControl "40/300"]

1. e4 {-0.11/20 7.0s} d6 2. d4 {+0.22/20 7.0s} g6 3. Nf3 {+0.44/20 7.0s}
Bg4 {-0.78/7 10.0s} 4. Be2 {+1.23/20 7.0s} Nf6 {-0.68/7 9.7s}
5. h3 {+1.42/20 7.1s} Bxf3 {-0.52/7 9.3s} 6. Bxf3 {+0.88/20 7.1s}
e5 {-0.45/7 9.0s} 7. dxe5 {+1.72/20 7.1s} dxe5 {-0.51/7 8.7s}
8. Qxd8+ {+2.09/20 7.1s} Kxd8 {-0.31/7 8.4s} 9. Bd2 {+2.21/21 7.2s}
Kc8 {-0.10/7 8.2s} 10. b4 {+1.97/20 7.2s} Nc6 {+0.35/7 7.9s}
11. c3 {+1.77/20 7.2s} h5 {+0.28/7 7.6s} 12. Ke2 {+1.45/20 7.3s}
Be7 {+0.49/6 7.4s} 13. b5 {+2.33/20 7.3s} Nd8 {+0.53/7 7.1s}
14. c4 {+1.58/20 7.3s} Ne6 {+0.70/7 6.9s} 15. Nc3 {+1.81/20 7.4s}
Nd4+ {+0.85/6 6.7s} 16. Kd3 {+0.85/20 7.4s} Rd8 {+1.02/6 6.4s}
17. Nd5 {+0.79/20 7.4s} Nxd5 {+1.08/7 6.2s} 18. cxd5 {+1.04/21 7.5s}
Nxb5 {+1.42/6 6.0s} 19. Be3 {+1.97/20 7.5s} Bf6 {+1.33/6 5.8s}
20. a4 {+2.45/20 7.6s} Nd4 {+1.46/7 5.6s} 21. h4 {+1.15/20 7.6s}
Nxf3 {+1.67/7 5.4s} 22. gxf3 {+1.05/20 7.6s} c6 {+1.73/7 5.3s}
23. Rac1 {+0.92/21 7.7s} Kb8 {+1.63/7 5.1s} 24. f4 {+0.45/20 7.7s}
cxd5 {+2.46/7 4.9s} 25. fxe5 {+0.06/20 7.8s} Bxe5 {+2.07/7 4.7s}
26. Rhg1 {+0.23/21 7.8s} d4 {+2.78/7 4.6s} 27. Bg5 {+0.66/20 7.8s}
Re8 {+2.24/7 4.4s} 28. f4 {+0.91/20 7.8s} Bc7 {+1.41/7 4.3s}
29. f5 {+0.23/21 7.8s} gxf5 {+2.06/8 4.1s} 30. exf5 {+0.80/20 7.8s}
b5 {+1.44/7 4.0s} 31. axb5 {+1.50/20 7.8s} Be5 {+0.78/7 3.9s}
32. Rc6 {+2.55/20 7.8s} Kb7 {+0.45/7 3.7s} 33. Bf6 {+1.49/20 7.8s}
Rab8 {-0.35/7 3.6s} 34. Rg7 {+3.59/20 7.7s} Bxf6 {-0.77/8 3.5s}
35. Rxf6 {+4.55/20 7.6s} Kc8 {-0.97/8 3.4s} 36. Rfxf7 {+5.57/20 7.6s}
Rxb5 {-0.90/8 3.3s} 37. f6 {+5.13/21 7.6s} Rb4 {-0.88/7 3.2s}
38. Rh7 {+5.33/20 7.6s} Re3+ {-0.44/7 3.1s} 39. Kd2 {+4.25/20 7.8s}
Rb2+ {-0.60/7 3.0s} 40. Kd1 {+3.60/21 8.4s} Rbe2 {-0.60/6 2.9s}
41. Rfg7 {+4.40/21 15s} Re1+ {0.00/8 13s} 42. Kc2 {+4.39/23 14s}
R1e2+ {0.00/7 12s} 43. Kc1 {+4.37/23 14s} Re1+ {0.00/7 12s}
44. Kd2 {+4.31/23 13s} R1e2+ {0.00/8 12s} 45. Kc1 {+4.17/24 13s}
Re1+ {0.00/8 11s} 46. Kd2 {+3.98/23 12s} R1e2+ {0.00/8 11s}
47. Kd1 {+4.00/23 12s} Re1+ {0.00/8 10s} 48. Kc2 {+3.83/24 11s}
R1e2+ {0.00/7 10s} 49. Kb1 {+2.36/25 11s}
Re1+ {0.00/7 9.7s, Draw by adjudication: user decision} 1/2-1/2

fa1558c6

CMCanavessi commented 6 years ago

It just beat Pwned 1.3 in my gauntlet, which is almost exactly as strong as TSCP. TC is 1 min + 1 sec, so even shorter. Looking great!

 136 Pwned v1.3 x64                         :  1794.8     254   92   19  143    40     7  2015.2    85    73.4
 137 TSCP 1.81 x32                          :  1792.2     254   89   24  141    40     9  2015.3    85    73.4
jkiliani commented 6 years ago

Gen 14 (209032) finished its match against Stockfish Level 5:

Score of lc_209032 vs sf_lv5: 17 - 81 - 2  [0.180] 100
Elo difference: -263.42 +/- 91.50

The improvement fits the self-play match well. I'll use @CMCanavessi's gauntlet of Gen 14 to check whether my estimate of 1450 Elo for SF Lv 5 was plausible.

Error323 commented 6 years ago

This is really fun to read. Our baby grows up so fast :')

CMCanavessi commented 6 years ago

The gauntlet for Gen 14 finished. It got 52/200 (Gen 12 had gotten 37/200). A nice improvement.

-----------------Leela Chess Zero Gen 14 x64-----------------
Leela Chess Zero Gen 14 x64 - AdaChess v2.1 (GSEI) x32         : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 14 x64 - Ceibo v0.3.65 x64                : 0,5/8 0-7-1 (000000=0)   6%  -478
Leela Chess Zero Gen 14 x64 - Dragontooth 0.2 Bahamut x64      : 4,5/8 4-3-1 (=1001110)  56%   +42
Leela Chess Zero Gen 14 x64 - Eden 0.0.13 x32                  : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 14 x64 - Enxadrista 1.0 x32               : 6,0/8 6-2-0 (11011110)  75%  +191
Leela Chess Zero Gen 14 x64 - Fimbulwinter v5.05 x32           : 7,0/8 7-1-0 (11011111)  88%  +346
Leela Chess Zero Gen 14 x64 - Frank 0.58 x32                   : 5,0/8 4-2-2 (1110==10)  63%   +92
Leela Chess Zero Gen 14 x64 - Joanna2002 1.06 x32              : 1,0/8 0-6-2 (0=0=0000)  13%  -330
Leela Chess Zero Gen 14 x64 - KillerQueen 2 beta 3 x32         : 4,5/8 4-3-1 (11001=01)  56%   +42
Leela Chess Zero Gen 14 x64 - LarsenVB 0.05 x32                : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 14 x64 - MSCP 1.4 x32                     : 1,5/8 1-6-1 (1=000000)  19%  -252
Leela Chess Zero Gen 14 x64 - Nanook v0.17 x32                 : 3,5/8 1-2-5 (==10=0==)  44%   -42
Leela Chess Zero Gen 14 x64 - Numpty Recharged x64             : 1,5/8 1-6-1 (00100=00)  19%  -252
Leela Chess Zero Gen 14 x64 - Pierre v1.7 x32                  : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 14 x64 - Piranha 0.5 x32                  : 1,0/8 1-7-0 (00000100)  13%  -330
Leela Chess Zero Gen 14 x64 - Pulse 1.6.1 x64                  : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 14 x64 - Pwned v1.3 x64                   : 1,0/8 1-7-0 (10000000)  13%  -330
Leela Chess Zero Gen 14 x64 - Sabrina 3.1.25 x64               : 2,0/8 1-5-2 (00=00=01)  25%  -191
Leela Chess Zero Gen 14 x64 - Satana 2.4.20 x64                : 2,5/8 2-5-1 (10000=10)  31%  -139
Leela Chess Zero Gen 14 x64 - Simon v1.2 x32                   : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 14 x64 - Skiull 0.3 x64                   : 0,5/8 0-7-1 (00000=00)   6%  -478
Leela Chess Zero Gen 14 x64 - Supra 26.0 Pro x64               : 1,5/8 1-6-1 (=1000000)  19%  -252
Leela Chess Zero Gen 14 x64 - Tikov 0.6.3 Rev 2 x32            : 5,0/8 5-3-0 (10101110)  63%   +92
Leela Chess Zero Gen 14 x64 - Toledo Nanochess Jan/11/2010 x32 : 3,5/8 3-4-1 (=1001010)  44%   -42
Leela Chess Zero Gen 14 x64 - TSCP 1.81 x32                    : 0,0/8 0-8-0 (00000000)   0% -1200

And the current rating list:

 183 Leela Chess Zero Gen 14 x64            :  1200.5     200   42   20  138    26    10  1452.0    25    25.0
 184 BRAMA 05/12/2004 x32                   :  1196.0     166   95   47   24    71    28   880.8    40    39.1
 185 Tikov 0.6.3 Rev 2 x32                  :  1160.1      80   28   11   41    42    14  1224.9    18    17.4
 186 Frank 0.58 x32                         :  1111.6      80   15   27   38    36    34  1227.4    18    17.4
 187 Leela Chess Zero Gen 12 x64            :  1098.5     246   61   25  160    30    10  1303.1    48    39.8
 188 Talvmenni 0.1 x32                      :  1094.6     106   70   30    6    80    28   680.1    27    26.8
 189 Iota 1.0 x32                           :  1082.5     166   80   44   42    61    27   886.2    40    39.1
 190 Usurpator II x32                       :  1069.0     166   88   24   54    60    14   886.9    40    39.1
 191 Xadreco 5.83 x32                       :  1061.2     194   86   23   85    50    12  1017.6    48    47.5
 192 Safrad 2.1.35.210 x32                  :  1004.4     242  115   28   99    53    12   899.1    35    30.9
 193 Hanzo the Razor x32                    :   993.8     102   53   46    3    75    45   662.3    26    25.8
 194 Fimbulwinter v5.05 x32                 :   987.4      76   12    9   55    22    12  1233.5    17    16.4
 195 MFChess 1.3 x32                        :   958.0     102   58   30   14    72    29   663.7    26    25.8
 196 Youk V1.05 x32                         :   931.0     194   66   28  100    41    14  1023.0    48    47.5
 197 StrategicDeep 1.25 x32                 :   923.8      92    7    4   81    10     4  1420.5    23    23.0
 198 Hippocampe v0.4.2 x32                  :   894.1     150   98   18   34    71    12   588.8    15    15.0
 199 Leela Chess Zero Gen 10 x64            :   861.6      92   53   11   28    64    12   655.7    23    23.0
 200 Zoe 0.1 x32                            :   827.4     102   47   29   26    60    28   668.9    26    25.8
 201 Pyotr Amateur Edition v0.6 x32         :   821.8     102   46   30   26    60    29   669.1    26    25.8
 202 Leela Chess Zero Gen 8 x64             :   792.9      92   45   17   30    58    18   655.7    23    23.0
 203 NSVChess 0.14 x32                      :   777.0     252  118   71   63    61    28   626.7    32    27.2
 204 Dikabi v0.4209 x32                     :   737.3     102   23   61   18    52    60   672.4    26    25.8
 205 Easy Peasy 1.0 x32                     :   666.4     252  118   30  104    53    12   632.8    32    27.2
 206 Pyotr Novice Edition v2.6 x32          :   649.6     102   35   22   45    45    22   675.8    26    25.8
 207 Leela Chess Zero Gen 6 x64             :   598.3      92   31   18   43    43    20   655.7    23    23.0
 208 Acqua ver. 20160918 x32                :   509.3     252   95   17  140    41     7   641.5    32    27.2
 209 N.E.G. 1.2 x32                         :   509.3     252   89   29  134    41    12   641.5    32    27.2
 210 Ram 2.0 x32                            :   387.7     252   58   46  148    32    18   648.3    32    27.2
 211 Leela Chess Zero Gen 4 x64             :   369.6     150   43   18   89    35    12   623.8    15    15.0
jkiliani commented 6 years ago

Gen 15 (6a5ccd) against Stockfish Level 5:

Score of lc_6a5ccd vs sf_lv5: 23 - 73 - 4  [0.250] 100
Elo difference: -190.85 +/- 78.70

Solid improvement this time, LZ is fast closing the gap. Glad to hear about the Gen 14 gauntlet results by the way, they seems to indicate that 1450 is quite accurate for Stockfish Lv 5. Once Level 5 is beaten, I'll try a match against the kingbase supervised net.

Almost glad that we have failed nets sometimes, or I couldn't keep up with the Stockfish and the FPU matches anymore 😀

CMCanavessi commented 6 years ago

After the superfinal of my tournament is finished (should be early tomorrow), I'll make some kind of "special" broadcast with LCZero (probably gen 16 or whatever gen is the latest) with long time controls against TSCP (maybe 100 games match with reversed openings), which is known by every single chess engine programmer out there and is ~1790 elo in my ranking. Should be fun if Leela can beat it or get close to that, and should give us some free promotion.

I might also start talking with the TCEC guys, maybe they want to have LCZ for next season which should be in 3-4 months time, and by then LCZ might be around 2500? That would be a HUGE success for both LCZ and TCEC, after all the A0 hype.

jkiliani commented 6 years ago

I think in 3-4 months LCZero will likely be somewhere around 3000 already. With the speed we're currently generating games, I think at that point we'll have upsized the neural net already at least once, probably to 128 filters, 10 blocks since the experience with Leela Zero shows that this is a very effective combination for a good performance net.

CMCanavessi commented 6 years ago

Well, 3000 would be even better, but I always try to be conservative :) If we account for TCEC's super long time controls, I think LCZ could surprise a lot of people.

jkiliani commented 6 years ago

Agreed, the larger the neural net, the better it will scale relative to Alpha-Beta engines.

jkiliani commented 6 years ago

Gen 16 (98240a) vs SF L5:

Score of lc_98240a vs sf_lv5: 26 - 69 - 5  [0.285] 100
Elo difference: -159.78 +/- 74.73

Gap is diminishing, but will likely still take a few network generations before LCZero is on par. Elo estimation from this match: ~1300.

CMCanavessi commented 6 years ago

Gen 16 Gauntlet:

-----------------Leela Chess Zero Gen 16 x64-----------------
Leela Chess Zero Gen 16 x64 - AdaChess v2.1 (GSEI) x32         : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 16 x64 - Ceibo v0.3.65 x64                : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 16 x64 - Dragontooth 0.2 Bahamut x64      : 3,5/8 3-4-1 (01011=00)  44%   -42
Leela Chess Zero Gen 16 x64 - Eden 0.0.13 x32                  : 0,5/8 0-7-1 (0=000000)   6%  -478
Leela Chess Zero Gen 16 x64 - Enxadrista 1.0 x32               : 4,5/8 3-2-3 (==10=110)  56%   +42
Leela Chess Zero Gen 16 x64 - Fimbulwinter v5.05 x32           : 5,0/8 5-3-0 (10011011)  63%   +92
Leela Chess Zero Gen 16 x64 - Frank 0.58 x32                   : 3,5/8 3-4-1 (10=10001)  44%   -42
Leela Chess Zero Gen 16 x64 - Joanna2002 1.06 x32              : 1,0/8 1-7-0 (00000100)  13%  -330
Leela Chess Zero Gen 16 x64 - KillerQueen 2 beta 3 x32         : 6,5/8 5-0-3 (111=1==1)  81%  +252
Leela Chess Zero Gen 16 x64 - LarsenVB 0.05 x32                : 1,0/8 1-7-0 (00001000)  13%  -330
Leela Chess Zero Gen 16 x64 - MSCP 1.4 x32                     : 4,5/8 4-3-1 (1=001101)  56%   +42
Leela Chess Zero Gen 16 x64 - Nanook v0.17 x32                 : 5,0/8 3-1-4 (1=01=1==)  63%   +92
Leela Chess Zero Gen 16 x64 - Numpty Recharged x64             : 1,5/8 1-6-1 (0100=000)  19%  -252
Leela Chess Zero Gen 16 x64 - Pierre v1.7 x32                  : 2,5/8 2-5-1 (=0100010)  31%  -139
Leela Chess Zero Gen 16 x64 - Piranha 0.5 x32                  : 1,0/8 1-7-0 (00001000)  13%  -330
Leela Chess Zero Gen 16 x64 - Pulse 1.6.1 x64                  : 1,0/8 1-7-0 (10000000)  13%  -330
Leela Chess Zero Gen 16 x64 - Pwned v1.3 x64                   : 1,0/8 1-7-0 (01000000)  13%  -330
Leela Chess Zero Gen 16 x64 - Sabrina 3.1.25 x64               : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 16 x64 - Satana 2.4.20 x64                : 2,5/8 2-5-1 (100100=0)  31%  -139
Leela Chess Zero Gen 16 x64 - Simon v1.2 x32                   : 1,5/8 1-6-1 (000100=0)  19%  -252
Leela Chess Zero Gen 16 x64 - Skiull 0.3 x64                   : 2,5/8 2-5-1 (01000=01)  31%  -139
Leela Chess Zero Gen 16 x64 - Supra 26.0 Pro x64               : 3,5/8 3-4-1 (0=110100)  44%   -42
Leela Chess Zero Gen 16 x64 - Tikov 0.6.3 Rev 2 x32            : 5,5/8 5-2-1 (0111101=)  69%  +139
Leela Chess Zero Gen 16 x64 - Toledo Nanochess Jan/11/2010 x32 : 3,5/8 2-3-3 (0011===0)  44%   -42
Leela Chess Zero Gen 16 x64 - TSCP 1.81 x32                    : 0,5/8 0-7-1 (000000=0)   6%  -478

Current ratings:

 179 Leela Chess Zero Gen 16 x64            :  1245.8     200   49   25  126    31    13  1442.6    25    25.0
 185 Leela Chess Zero Gen 14 x64            :  1192.9     200   42   20  138    26    10  1442.6    25    25.0
 188 Leela Chess Zero Gen 12 x64            :  1099.6     250   64   26  160    31    10  1291.3    49    40.8
 201 Leela Chess Zero Gen 10 x64            :   861.1      92   53   11   28    64    12   655.3    23    23.0
 204 Leela Chess Zero Gen 8 x64             :   792.5      92   45   17   30    58    18   655.3    23    23.0
 209 Leela Chess Zero Gen 6 x64             :   598.1      92   31   18   43    43    20   655.3    23    23.0
 213 Leela Chess Zero Gen 4 x64             :   369.6     150   43   18   89    35    12   624.0    15    15.0

image

Gen 12 finished with 37/200 Gen 14 finished with 52/200 Gen 16 finished with 61.5/200

jkiliani commented 6 years ago

Further progress, gen 17 (38576a) against Stockfish Level 5:

Score of lc_38576a vs sf_lv5: 33 - 61 - 6  [0.360] 100
Elo difference: -99.95 +/- 69.62

The gap is closing, maybe two more nets...

jkiliani commented 6 years ago

Gen 18 (8c1c61) vs SF Lv5:

Score of lc_8c1c61 vs sf_lv5: 39 - 56 - 5  [0.415] 100
Elo difference: -59.64 +/- 68.14

One more net is probably optimistic, I doubt 7428c7 could do it based on Elo Delta. But shouldn't be long now.

jkiliani commented 6 years ago

And surprisingly, Id29 (9fa03e) did actually beat Stockfish Level 5, although narrowly:

Score of lc_id29 vs sf_lv5: 51 - 44 - 5  [0.535] 100
Elo difference: 24.36 +/- 67.27

From the Self-play Elo progression I did not expect this yet, but it appears that the strengths and weaknesses of Leela Chess are currently shifting in a way that roughly balances out against itself, but helps against Stockfish. Or it may simply be statistical noise since I'm only doing 100 games each.

Very soon it may be time to test the reinforcement learning nets against the Kingbase supervised net.

After LCZero can beat SF Lv 5 with 85% winrate, I will start pitting it against Stockfish Level 10. From my tests between SF Lv5 and SF Lv10, the rating difference between the two should be around 550 Elo, similar to Lv0 to Lv5, and Level 10 would be very roughly in the 2000 Elo range.

I am now switching to network Id numbers by the way for my match reports, since "generation" is not an official designation and also has very limited usefulness with a lot of failed nets and narrow passes in between.

CMCanavessi commented 6 years ago

Gen 20 gauntlet:

-----------------Leela Chess Zero Gen 20 x64-----------------
Leela Chess Zero Gen 20 x64 - AdaChess v2.1 (GSEI) x32         : 2,0/8 2-6-0 (01001000)  25%  -191
Leela Chess Zero Gen 20 x64 - Ceibo v0.3.65 x64                : 1,0/8 1-7-0 (00001000)  13%  -330
Leela Chess Zero Gen 20 x64 - Dragontooth 0.2 Bahamut x64      : 6,0/8 5-1-2 (=10=1111)  75%  +191
Leela Chess Zero Gen 20 x64 - Eden 0.0.13 x32                  : 1,0/8 1-7-0 (00000001)  13%  -330
Leela Chess Zero Gen 20 x64 - Enxadrista 1.0 x32               : 3,0/8 3-5-0 (01010010)  38%   -85
Leela Chess Zero Gen 20 x64 - Fimbulwinter v5.05 x32           : 8,0/8 8-0-0 (11111111) 100% +1200
Leela Chess Zero Gen 20 x64 - Frank 0.58 x32                   : 5,5/8 5-2-1 (111001=1)  69%  +139
Leela Chess Zero Gen 20 x64 - Joanna2002 1.06 x32              : 3,5/8 1-2-5 (00===1==)  44%   -42
Leela Chess Zero Gen 20 x64 - KillerQueen 2 beta 3 x32         : 4,5/8 4-3-1 (01=11001)  56%   +42
Leela Chess Zero Gen 20 x64 - LarsenVB 0.05 x32                : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 20 x64 - MSCP 1.4 x32                     : 6,0/8 6-2-0 (01111011)  75%  +191
Leela Chess Zero Gen 20 x64 - Nanook v0.17 x32                 : 4,5/8 2-1-5 (1=1=0===)  56%   +42
Leela Chess Zero Gen 20 x64 - Numpty Recharged x64             : 2,0/8 2-6-0 (00000101)  25%  -191
Leela Chess Zero Gen 20 x64 - Pierre v1.7 x32                  : 5,0/8 5-3-0 (11011010)  63%   +92
Leela Chess Zero Gen 20 x64 - Piranha 0.5 x32                  : 2,0/8 2-6-0 (00000110)  25%  -191
Leela Chess Zero Gen 20 x64 - Pulse 1.6.1 x64                  : 3,0/8 2-4-2 (00101=0=)  38%   -85
Leela Chess Zero Gen 20 x64 - Pwned v1.3 x64                   : 1,5/8 1-6-1 (1=000000)  19%  -252
Leela Chess Zero Gen 20 x64 - Sabrina 3.1.25 x64               : 3,0/8 2-4-2 (00==0011)  38%   -85
Leela Chess Zero Gen 20 x64 - Satana 2.4.20 x64                : 2,5/8 2-5-1 (01000=10)  31%  -139
Leela Chess Zero Gen 20 x64 - Simon v1.2 x32                   : 1,0/8 0-6-2 (00==0000)  13%  -330
Leela Chess Zero Gen 20 x64 - Skiull 0.3 x64                   : 0,0/8 0-8-0 (00000000)   0% -1200
Leela Chess Zero Gen 20 x64 - Supra 26.0 Pro x64               : 3,0/8 3-5-0 (00001110)  38%   -85
Leela Chess Zero Gen 20 x64 - Tikov 0.6.3 Rev 2 x32            : 5,5/8 5-2-1 (1111001=)  69%  +139
Leela Chess Zero Gen 20 x64 - Toledo Nanochess Jan/11/2010 x32 : 4,0/8 2-2-4 (1=0=01==)  50%    ±0
Leela Chess Zero Gen 20 x64 - TSCP 1.81 x32                    : 0,0/8 0-8-0 (00000000)   0% -1200

Gen 16 got 60/200 Gen 20 got 77.5/200

Big improvement over Gen 16

Calculated elo:
Gen 14 | 1191.3
Gen 16 | 1235.8
Gen 20 | 1325.8
jkiliani commented 6 years ago

Small regression against SF Level 5 with id31 (dd080d):

Score of lc_id31 vs sf_lv5: 43 - 52 - 5  [0.455] 100
Elo difference: -31.35 +/- 67.39

I would attribute this mostly to the probably large error in strength tests at 100 games. Looks like current nets are roughly on par with SF Lv 5.