amanjpro / zahak

A UCI compatible chess AI in Go
https://zahak.amanj.me
MIT License
30 stars 11 forks source link

+250-300? #51

Closed SzotsGabor closed 3 years ago

SzotsGabor commented 3 years ago

Hi Amanj,

This might not be very useful to you but still.

You claimed a 250-300 Elo improvement by v1.0.0 over v0.3.0. Therefore for my current tournament I selected opponents in the 2160-2200 range (to be on the safe side). However, even this selection has proved too strong. At the time of writing Zahak has a score of 6 out of 30. I watched some of the games. It is hard to point to a salient problem. It seemes to me one of the problems is in endgames with passed pawns. Also, somehow the depth reached is less than that of most of the opponents. Zahak seems to fall for traps.

I may send you the PGN if you are interested. You can find my e-mail address in my CCC profile.

Best regards, Gabor

amanjpro commented 3 years ago

Interesting, my tests showed that (well, with self play):

https://github.com/amanjpro/zahak/pull/49#issuecomment-814260987

https://github.com/amanjpro/zahak/pull/49#issuecomment-814261970

mmm... most likely the problem is with self-play. I think it is time for me to expand the pool of the engines, on which I test Zahak against

amanjpro commented 3 years ago

I would be grateful to see PGNs, thanks a lot

SzotsGabor commented 3 years ago

I read at CCC that the real difference is about 70 % of what self play shows.

In my tournament Zahak is on 17/68. I attach the PGN.

------ Eredeti üzenet ------ Feladó: "Amanj Sherwany" @.> Címzett: "amanjpro/zahak" @.> Másolat: "Gabor Szots" @.>; "Author" @.> Elküldve: 2021.04.09. 15:48:42 Tárgy: Re: [amanjpro/zahak] +250-300? (#51)

Interesting, my tests showed that (well, with self play):

49 (comment)

https://github.com/amanjpro/zahak/pull/49#issuecomment-814260987

49 (comment)

https://github.com/amanjpro/zahak/pull/49#issuecomment-814261970

mmm... most likely the problem is with self-play. I think it is time for me to expand the pool of the engines, on which I test Zahak for

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/amanjpro/zahak/issues/51#issuecomment-816695153, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFRWZ6KBEAWXJILGTERETHTTH4ASXANCNFSM42UWWBDA.

SzotsGabor commented 3 years ago

Well, I'm not sure I have sent the PGN. I can't see it here.

amanjpro commented 3 years ago

You cannot send attachments by email to a github comment, unfortunately :(

SzotsGabor commented 3 years ago

Maybe this way.

Zahak 1.0.0 64-bit - Apr 9.pgn.zip

amanjpro commented 3 years ago

Thank you a lot :)

SzotsGabor commented 3 years ago

FYI, I also measure a great difference in self play.

Score of Zahak_0.3.0-x64 vs Zahak_1.0.0-x64: 1 - 13 - 6 [0.200] ... Zahak_0.3.0-x64 playing White: 0 - 6 - 4 [0.200] 10

... Zahak_0.3.0-x64 playing Black: 1 - 7 - 2 [0.200] 10

... White vs Black: 7 - 7 - 6 [0.500] 20

Elo difference: -240.8 +/- 159.1, LOS: 0.1 %, DrawRatio: 30.0 %

20 of 20 games finished.

This was played at 30s+0,2s TC.

amanjpro commented 3 years ago

FYI, I also measure a great difference in self play. Score of Zahak_0.3.0-x64 vs Zahak_1.0.0-x64: 1 - 13 - 6 [0.200] ... Zahak_0.3.0-x64 playing White: 0 - 6 - 4 [0.200] 10 ... Zahak_0.3.0-x64 playing Black: 1 - 7 - 2 [0.200] 10 ... White vs Black: 7 - 7 - 6 [0.500] 20 Elo difference: -240.8 +/- 159.1, LOS: 0.1 %, DrawRatio: 30.0 % 20 of 20 games finished. This was played at 30s+0,2s TC.

My explanation is passed pawns, I know I can easily prune promotions, or passed pawn moves... my move ordering doesn't care much about them. This might explain why it can crush weaker engines (old version), but struggles to convert against equal strength engines? as they usually require good endgames? just a theory, not proven yet

amanjpro commented 3 years ago

So, this is really strange, I have been playing with passed pawns, and currently running a match (still going on):

Rank Name                          Elo     +/-   Games    Wins  Losses   Draws   Points   WWins  WLoss.  WDraws   BWins  BLoss.  BDraws 
   0 zahak_dev                     -27      53     130      46      56      28     60.0      25      25      15      21      31      13 
   1 baislicka                     289     185      22      16       1       5     18.5       8       1       2       8       0       3 
   2 Achillees                     191     157      22      14       3       5     16.5       7       1       3       7       2       2 
   3 gopher_check                   64     141      22      11       7       4     13.0       7       3       1       4       4       3 
   4 zahak-darwin-amd64-1.0.0      -64     131      22       6      10       6      9.0       4       4       3       2       6       3 
   5 vice                          -70     136      20       5       9       6      8.0       3       3       4       2       6       2 
   6 rustic                       -213     194      22       4      16       2      5.0       2       9       0       2       7       2 

Started game 133 of 1200 (zahak_next vs zahak-darwin-amd64-latest)

Looking at gopher-check, even though it is supposed to be 100 elo stronger than Achillees and Baislicka, but it does much worse than both of them. Not sure what to make with this really

amanjpro commented 3 years ago

I'll have results tomorrow, and will update here

amanjpro commented 3 years ago

As promised I came back with my final numbers for the above match. And gopher-check still does less good than the "lower rated" engines. This is interesting/strange to me. Even looking at vice (which is rated around 2000), does terribly bad against Zahak. I'll be working on Zahak to see the reason behind it.

Rank Name                          Elo     +/-   Games    Wins  Losses   Draws   Points   WWins  WLoss.  WDraws   BWins  BLoss.  BDraws 
   0 zahak_next (PR #52 )          -14      18    1200     461     508     231    576.5     251     236     113     210     272     118 
   1 baislicka                     189      49     200     133      34      33    149.5      73      13      14      60      21      19 
   2 Achillees                     166      49     200     129      40      31    144.5      64      22      14      65      18      17 
   3 gopher_check                  109      43     200     104      43      53    130.5      57      15      28      47      28      25 
   4 zahak-darwin-amd64-1.0.0      -26      40     200      60      75      65     92.5      37      32      31      23      43      34 
   5 vice  (v 1.1)                -173      48     200      37     129      34     54.0      16      59      25      21      70       9 
   6 rustic   (1 alpha 2)         -179      53     200      45     140      15     52.5      25      69       6      20      71       9 

Finished match

And according to bayeselo:

ResultSet>readpgn zahak_games/passed-pawns-1.pgn   
1200 game(s) loaded
ResultSet>elo
ResultSet-EloRating>mm
00:00:00,00
ResultSet-EloRating>exactdist
00:00:00,00              
ResultSet-EloRating>ratings
Rank Name                       Rating   Δ     +    -     #     Σ    Σ%     W    L    D   W%    =%   OppR 
---------------------------------------------------------------------------------------------------------
   1 baislicka                   3282   0.0   41   39   200  149.5  74.8  133   34   33  66.5  16.5  3090 
   2 Achillees                   3264  18.4   40   39   200  144.5  72.2  129   40   31  64.5  15.5  3090 
   3 gopher_check                3196  67.7   37   36   200  130.5  65.2  104   43   53  52.0  26.5  3090 
   4 zahak_next                  3090 106.2   16   16  1200  576.5  48.0  461  508  231  38.4  19.2  3102 
   5 zahak-darwin-amd64-latest   3066  24.5   34   35   200   92.5  46.2   60   75   65  30.0  32.5  3090 
   6 vice                        2911 154.4   38   40   200   54.0  27.0   37  129   34  18.5  17.0  3090 
   7 rustic                      2890  21.1   41   43   200   52.5  26.2   45  140   15  22.5   7.5  3090 
---------------------------------------------------------------------------------------------------------
  Δ = delta from the next higher rated opponent
  # = number of games played
  Σ = total score, 1 point for win, 1/2 point for draw
amanjpro commented 3 years ago

Zahak 2.0.0 should fullfil the promise