amchess / ShashChess

A try to implement Alexander Shashin's theory on a Stockfish's derived chess engine
GNU General Public License v3.0
131 stars 28 forks source link

Evaluation Multiplier #35

Closed MSoszynski closed 1 year ago

MSoszynski commented 1 year ago

When I compare the numerical evaluations of ShashChess with two other strong engines (when the same move is chosen) I typically find something like this: Stockfish is 3x, while Dragon is 2x, that of ShashChess. This makes ShashChess difficult to integrate with other analyses. So, would it be possible to have a multiplier option in ShashChess? In my own experiments, multiplying its (non-zero) evaluations by 1.8 would make it more realistic and usable, but of course that decision could be left to the user. Perhaps you might want to set a maximum of 3. Of course 1 would leave ShashChess as it is now.

tissatussa commented 1 year ago

I know what you mean and it's true. But when ShashChess gives an (example) eval of 0.25 where SF gives 0.75, this shows ShashChess knows how to (best) defend that position, so we can conclude the engine is not as optimistic as SF !? So, when you want to "integrate [the eval] with other analyses", you can apply such 3x or 2x yourself !? I hope you understand my thinking ..

MSoszynski commented 1 year ago

The issue is that ShashChess is statistically an outlier. It doesn't defend positions greatly better than its chief rivals (see the ratings) and yet it does score positions greatly at variance with them. Why? It's like Centigrade v Fahrenheit; neither is cooler than the other just becomes the numbers are smaller. Meanwhile I would rather change an engine parameter once than do mental arithmetic every time I compare analyses with ShashChess.

tissatussa commented 1 year ago

[..] change an engine parameter once

by 'parameter' you mean a factor or transformation function for the eval and create a new UCI option for it ?

MSoszynski commented 1 year ago

Yes, just that.

tissatussa commented 1 year ago

maybe test positions could be used to determine concerning functions for each engine, eg. Dragon 2x .. just a thought .. how would you implement this idea ?

tissatussa commented 1 year ago

It doesn't defend positions greatly better than its chief rivals [..] and yet it does score positions greatly at variance with them. Why?

so, in fact this is your first question ? the eval value represents the cp, isn't it ? maybe ShashChess evaluates position wise, not really "0.5 is half a pawn" .. and what about the typical settings Capablanca etc.. You described the matter clearly though ..

tissatussa commented 1 year ago

i find https://lichess.org/page/accuracy which explains a lot. here's a part :

Why not just use Stockfish centipawns?

Centipawns are great for developing chess engines, which is their main use. But not so much for human comprehension.

A major issue with centipawns is that they're dependent of the position evaluation. For example, losing 300 centipawns in an equal position is a major blunder. But losing 300 centipawns when the game is already won or lost makes almost no difference and is largely irrelevant.

Thus, "300 centipawns" has no meaning on its own for a human. That's the problem we aim to solve with Win% and Accuracy%. These new values are derived from centipawns, but they try to be independent of the position evaluation. 30 Accuracy% should mean the same thing whether the position is equal or winning/losing.

MSoszynski commented 1 year ago

I'm not sure that's relevant. Anyway, the latest Stockfish patch 22110508 "Normalize evaluation" requires us to suspend our discussion while we reflect on it.

MSoszynski commented 1 year ago

The new ShashChess 25.4 (with the Normalize patch) addresses the issues that I had in my original comment. I am now content, and I consider the case closed.

amchess commented 1 year ago

Yes: I did expressely