Timmoth / Sapling

A strong dotnet UCI Chess engine - My leaf nodes are growing
https://iblunder.com
Apache License 2.0
39 stars 2 forks source link

no consistent eval (v1.1.2) #9

Closed tissatussa closed 1 month ago

tissatussa commented 1 month ago

i managed to compile v1.1.2 on Linux and i let the engine play a Double Round Robin 5m3s tournament with 3 (strong) engines. Just to see how this Sapling version plays, but also to investigate the kind of positions which arise after these opening moves : 1.Nc3 c5 :

1 Nc3-c5

this position holds a story which i will tell in this Issue. also some unexpected engine 'behaviour' happened, which i will explain - for you to find out the cause ..

the move 1.Nc3 isn't great : this way White more or less gives the opening advantage back to Black, this opening move is rarely played at top level chess. Now Black is invited to claim some part of the center, which is one main goal in the opening. For reference : Stockfish 17 NNUE gives best 4 moves for White in the startposition : e4, d4, c4 and Nf3, all about +0.25 eval (we can know that because SF has MultiPV, a convenient feature) - Sapling gives e4 best move with about +0.50 (it always seems to be 'optimistic' with its evals). One of the answers Black can give is 1...c5 and White can transpose into the Sicilian Defence by 2.e4 but White can also play the rare move 2.Ne4!? which attacks pawn c5 and here Black almost always plays 2...e6 which leads to this position :

1 Nc3-c5-2 Ne4-e6

This variation has no name, as far as i know .. i call it "No-Sicilian" .. recently i saw a YT video about this, it leads to new patterns. The kNight shouldn't move a second time in the opening, a basic chess rule .. after Black defends the pawn, the White kNight can easily be kicked by the natural move d7-d5 and Black is better, one would think .. i think this is true, but chess is about ideas : with human play these positions will lead to unique patterns which can cause confusion and mistakes. One creative continuation is 3.f4! which seems harmonious in this constellation : when Black plays 3...d5 White can do 4.Nf2! and all seems well .. then, after Black plays the natural 4...Bd6, this position arises :

1 Nc3-c5-2 Ne4-e6-3 f4-d5-4 Nf2-Bd6

All this happened when i setup the mentioned tournament, with starting position after 1.Nc3 c5 2.Ne4 : in game 1 Sapling played White and indeed this happened : 2...e6 {all engines play that} 3.f4! Not all engines consider that move as a candidate, and even play it .. in the game Sapling continued with 5.d3 and lost (against SF JS, later winner with Titan) ..

In my last position White must find a solution to defend pawn f4, and here several 5th moves exist : d3, e3, g3 and even c3 (Bxf4? Qa4+!) .. i was curious how these game would develop to see the patterns and get ideas .. some engines decide 5.e3 and then play b2-b3 to develop the Bc1. When moving the d-pawn, the White structure becomes different, to me a bit awkward, but Sapling played 5.d3 and i was disappointed .. later i investigated the concerning position with SCID, letting Sapling search infinite, to see if it really found 5.d3 to be bestmove .. well, it's not clear : accidently i discovered Sapling not being consistent .. i did several runs, each after opening and closing SCID (a hard reset) :

Sapling-v1 1 2-d3


Sapling-v1 1 2-e3


Sapling-v1 1 2-e3-B

do you have an explanation for these differences ? btw. i often encounter engines giving other evals in SCID compared to CuteChess - this may sound strange but i'm sure, i didn't solve this yet ..

the tournament had 2 winners :

Rank Name                 Rating   Elo     +/-   Games   Points   Score    Draw 
   1 SF JS skill 20=max     ????   191     238       6      4.5   75.0%   50.0% 
   2 Titan v1.1 NNUE        3551   191     238       6      4.5   75.0%   50.0% 
   3 Patricia v3.1 NNUE     3292     0     271       6      3.0   50.0%   33.3% 
   4 Sapling v1.1.2 NNUE    3150  -inf     nan       6      0.0    0.0%    0.0% 

This is the CuteChess Result List, i added ratings. The 'SF JS' engine is the Stockfish javascript version 16.1 with NNUE, i didn't expect it to be so strong ..

download PGN : My-Tournament-5m3s-4boys-no-sicilian.zip

most engines played 3.f4 ! btw. the SCID screenshot below shows shifted move numbering due to its interpretation of the starting FEN.

all-games

Timmoth commented 1 month ago

This is incredibly useful thank you!

Very strange indeed. continuing discussion from #8, your report actually helped me nail down a bug... There was an issue with the Cuckoo filter which meant it wasn't able to detect three fold repetitions correctly. I've fixed it with 1.1.3 and it seems to be better in those scenarios.

For this issue I can't replicate in V1.1.3, it seems to always choose d6f4 every time, at least when executing:

position fen rnbqk1nr/pp3ppp/3bp3/2pp4/5P2/8/PPPPPNPP/R1BQKBNR KQkq - - 0 1
go 

I'll keep investigating, but for sure it should be deterministic to the same depth, unless your running with multiple threads, I believe then there is the capability for the result to differ. So you may have uncovered another genuine bug!

tissatussa commented 1 month ago

..unless your running with multiple threads..

i run with 2 threads and 128 or 256 Mb Hash

Timmoth commented 1 month ago

Even so, it's still curious within the search the PV changes so dramatically at each depth. Something definitely doesn't seem right there.

tissatussa commented 1 month ago

you mention the FEN rnbqk1nr/pp3ppp/3bp3/2pp4/5P2/8/PPPPPNPP/R1BQKBNR KQkq - - 0 1 but this is invalid, you'd add the color to play, i.e. 'w' (not Black!) : rnbqk1nr/pp3ppp/3bp3/2pp4/5P2/8/PPPPPNPP/R1BQKBNR w KQkq - - 0 1 .. when Black was to move in this position, Bd6xf4 would be logical and good .. so, you should redo this.

Timmoth commented 1 month ago

Yep my bad I really messed up that fen string, tried to create it by hand like an idiot. Okay that position is now giving similar results to what you showed. Thanks for finding this, if I can reliably reproduce the issue I should be able to find out wtf is going wrong.

position fen rnbqk1nr/pp3ppp/3bp3/2pp4/5P2/8/PPPPPNPP/R1BQKBNR w KQkq - 0 1
go