fairy-stockfish / Fairy-Stockfish

chess variant engine supporting Xiangqi, Shogi, Janggi, Makruk, S-Chess, Crazyhouse, Bughouse, and many more
https://fairy-stockfish.github.io/
GNU General Public License v3.0
615 stars 192 forks source link

Scoring of wins by material counting #202

Open kimqwer2 opened 3 years ago

kimqwer2 commented 3 years ago

Because the text is long, the translation is not smooth.

Because stockfish plays the score based on the goal of a complete victory, it feels like you are moving away from a complete victory as the game progresses. So, if you decide you won't be able to win a complete victory, it looks good to find a win based on your score. it would be nice to judge if you can win a complete victory. The pros of janngi know whether checkmates are available or not due to the type and number of my and opponent's objects. Even janggi's pros don't seem to be fully aware of these conditions, but with decades of experience, sensations, and intuition, they seem to be trying to get only certain pieces. This is the current stockfish's biggest weakness. Because the pros know the conditions that prevent each other from reaching a complete victory, they use strategies that prevent the opponent from achieving the conditions and win points. Thus, by using the machine's judgment of the plate based on the score, it deliberately creates a situation that prevents the machine from reaching a complete victory. And you induce a situation in your favor by matching the conditions that lead to complete victory. Therefore, if you find out the necessary and sufficient conditions of Checkmate and try to match it, I think it will lead to a complete victory. If you think you won't be able to hit it, I think it's a good idea to give up a complete victory and focus on reading numbers based on points. I do not know exactly the necessary and sufficient conditions of CheckMate because I have low skills. For example, if you only have 2 Wazirs and 2 cannons, you cannot make a 100% checkmate situation. Or, even with 2 Wazirs, 1 cannon, and 1 Elephant, you cannot 100% checkmate. If we can make judgments about this situation, Fewer cases of not reaching each other's complete victory, Can avoid meaningless repetitive patterns, Inevitably, I don't think I have to look at the score win. Checkmate Needs Enough Database doesn't seem to exist yet, as there hasn't been that much in-depth research on janggi. I don't know exactly about this. This is one idea, and I think it will help in the post-production of stockfish.

cjssh1002 commented 3 years ago

This is an EGTB option, currently FS-Janggi does not support EGTB. It is not an engine problem.

ianfab commented 3 years ago

Thanks for your feedback. The problem here is that Fairy-SF does not even distinguish between a win by checkmate and a win by material counting, because there is just one win/checkmate score which is used for both winning conditions, so from its point of view a game can just end in win or loss (or draw in other variants).

Since chess engines and chess engine protocols are generally designed for a game with 3 possible results (win, loss, draw), it is hard to consistently fit in a game with 4 to 5 possible results (win by checkmate, win by counting, draw, loss by counting, loss by checkmate). It causes problems in both the internal representation of the score (what is better, a +3 advantage or win by material counting?) as well as the displaying of scores in the GUI for wins by material counting (is it indicated as let's say +10 or #1).

I actually was pondering about this problem from the beginning as soon as I was aware of the 7-0 vs. 4-2 scoring system. One potential solution I thought of that might be possible is to reflect wins by material counting using a modified mate score, e.g., a mate in one by material counting could be #101 both internally as well as when displaying it in the GUI. However, this can potentially also be problematic, because some software (GUIs, api usage, scripts) might rely on that the game ends after mate in 1 (or ultimately 0). And it would still only drive the engine away from a win by material counting if it can find a forced mate, otherwise it would still go for it, even if it might still have a checkmate that it did not detect yet.

So those are the reasons why distinguishing the scores between win by material counting and checkmate is difficult. If anyone has good suggestions how to solve this, input is very welcome.

cjssh1002 commented 3 years ago

Is it difficult for FS to recognize the win pieces, lose pieces, and draw pieces in janggi, and reflecting them in search results and scores?

Most people think in terms of win-loss criteria for online games, but I mean the essential approach of janggi tactics.

ianfab commented 3 years ago

The main problem is not to recognize if it can win by checkmate (although that also might need to be improved), but whether it should do it in the first place. If it does not distinguish between a win by checkmate and by material counting, why should it look for checkmate possibilities if it already found a win by material counting? The engine needs to have an objective reason to prefer checkmate (i.e., a better score), since that is what it tries to optimize for.

cjssh1002 commented 3 years ago

Since there is no EGTB file for FS-Janggi, the FS always must search and find it even if the winning or losing pieces are decided. If it's a winning piece, the FS can find it.(mostly) In the opposite case it puts the game in an infinite loop.

suggestion 1

  1. There are already verified results of janggi's winning pieces, draw pieces, and losing pieces.
  2. Stockfish aims to be Mate. However, when there is a valid piece left for a draw or defeat, it switches to the search mode for the piece score.
  3. And the syzygy 50 move rule apply.
  4. Score win or lose decision

just an opinion.

HGMuller commented 3 years ago

To contribute my 2 cents:

Score reporting - In CECP scores have always been reported numerically. Relatively recently I have introduced a standard for mate scores (100000+N for mate in N moves). We could use 90000+N for win-by-count in N. These conventions are benign w.r.t. backward compatibility: GUIs not implementing them would simply display 1000.NN or 900.NN, and the user should just read 1000 as a funny spelling for 'mate in'. In UCI style this would really need a new keyword. Disguising a win-by-count score as a cp score or a mate score would be the same kludge as CECP uses. One could introduce something like "score count +/-N". This is unfortunately not backward compatible with anything. Perhaps a win by count should be dubbed a count-mate (just like there is check-mate and stale-mate), and reporting like "score count mate N" would be understood by non-supporting GUIs as a normal mate score.

Internal score accounting - the problem that SF would prefer a win-by-count over a checkmate it does not see yet is not qualitatively different from preferring a stalemate over an unseen mate when the score is negative. It is just that it is far more likely. (The other thing requires a rather gross misevaluation.)

I suppose there is some empirical knowledge on how large middle-game heuristic scores have to be to overcome the 'draw margin'. (I.e. the score range for which the game usually will not end in checkmate with good-quality play. This could depend on game phase.) Logically, the score for a win-by count should be at the edge of that range: it is the best you can do if stalemate cannot be expected.

Knowledge on drawishness of material combinations with few men is always very useful in any Chess variant. In orthodox Chess you'd better know that KRBKR offers zero to none winning chances, or you will bungle many victories. Usually such combinations get their 'naive' middle-game evaluation scores scaled towards the draw score by dividing them by 4 or 8, so that the apparent 325cP advantage of a Bishop is reduced to a score less than a Pawn, so that the engine playing the strong side is not going to prefer KRBKR over KRPPKR or even KRPKR.

This is not really different in Janggi; in Chess the need for scaling usually only arises when the leading side has no Pawns, or in danger of getting in that situation by its last Pawn being sacrificed away. Because promoting a single Pawn in Chess restores a winning advantage. In Janggi you are basically always in that situation; there is no promotion, and you have to checkmate with what you have. But like in Chess, the drawishness mainly arises when there are few men; a pawnless advantage of a minor is not enough in Chess, in combinations like KRNKR or KQBKQ, but with KQRRBNKQRRN it becomes a quite difference issue; there are still ample opportunities to increase your advantage. Chess is an unstable game: advantages tend to grow exponentially.

So to conclude: there must be a table, or a simple rule, to identify drawish material combinations with 5, 6 and perhaps 7 men, such as in pawnless Chess we have "minor ahead = draw". The heuristic middle-game score must then be scaled down into the draw range that would be valid for compinations with still many pieces (e.g. many additional Pawns).

An observarion - Note that in Janggi/Xiangqi Pawns, even though they cannot promote, still have a unique property: they are attack-only pieces, which cannot be used for defense. So when a combination like KRAA vs KHA would be generally a draw (just a guess for the sake of argument), adding 2 Pawns on each side which are sufficiently far advanced to not be able to block each other gives extra attack, but no extra defense. The KHA defense that would have stood up against a lone R, might crumble against RPP, while the HPP attack against KRAA (or perhaps even KAA) would be as futile as ever, making the two extra Pawns for the weak side basically worthless.

I am not sure how this works out in Jianggi; in Xiangqi Pawns that have not crossed the River yet can block each other's advance, and even an advantage in attacking pieces might not be able to remove the bocker (e.g. because it can be protected by an Elephant). In Janggi it might be far more difficult for the weaker player to prevent the Pawns of the stronger player from passing his own (and reaching his Palace) without being traded.

kimqwer2 commented 3 years ago

It's not a good alternative to solving the problem, but a little more additional opinion. Checkmate requirement seems to be a little difficult concept. There are many cases in each situation, and it doesn't seem good to judge only by the type and number of my material and the other's material. If my material is locked up by someone, I may meet the conditions, but I cannot use it. Unless you have an algorithm based on many databases and flexible response, I think you should consider this approach. As the developer said, we believe that the conditions added to the simple calculation can interfere with optimization.

It's an extreme example, but it's the most common pattern of defeat for a stockfish when playing a game. 캡처

In two situations, the blue score is significantly higher than the red score. (1. blue 16 red 13, 2. blue 20 red 16) But both ends up winning red. In the picture, I think it is red good from the beginning by stockfish evaluation. But in real game situations, it is more complicated than the shape of the picture, and there will be some more materials. And stockfish would think a blue with a higher score is good. But in the end, it's a complete defeat. The issue of the score difference between the current stockfish's relationship with horsese and cannon has been previously resolved. But Cannon around the palace doesn't die well because on average, he has little movement and doesn't meet his opponent well. That is why there are often two cannon left. Conversely, in the case of pawns, their value and mobility are very low, and because they often meet with opponents, the number of pawns decreases very much as the game progresses. On average, if your opponent has 3-4 pawns, stockfish often has 0 or 1 pawn. I can't tell exactly whether it's a good thing or not to reduce the number of pawns. But I don't think it's good to have only two cannon left and no other substances left.

If the distinction between checkmate and score win is ambiguous, it would be nice if there was a method of putting in a sub-option method that makes you think based on score as a countermeasure, but winboard does not seem to support such a function. And since the current stockfish has a score deviation, I don't know exactly whether 1.5 points are given to red. In the photo above, the evaluation score is similar even if you change each other's colors and calculate them.

HGMuller commented 3 years ago

I think what the previous message tells us is mainly that the current heuristic material eval sucks. Cannons already decrease very much in value in Xiangqi in the late end-game (where KPK is a win, but KCEK is a draw!); in Janggi this is far worse, because even the non-capture mobility of Cannons drops to near zero there. And in view of what I wrote above, 'passed Pawns' become really very valuable.

In connection with the counting issiue: if Cannons keep their high value in the count, even though they are practically worthless in play in the late end-game, this discrepancy offers very interesting dilemmas. You might want to preserve an otherwise useless Cannon to be sure you end up on top when the game ends in counting. The way to encourage that is not by assigning a high (end-game) value to the Cannon, though, but just give some bonus to having the higher count when the heuristic score is in the draw range. Then the engine won't start hoarding Cannons if it is already ahead in count.

ianfab commented 3 years ago

Please let us not mix independent topics. If there are practically relevant (i.e., not study-like) positions that Fairy-SF drastically misevaluates, then we can discuss this in a separate issue, but here it does not contribute anything to the solution of this topic, because no matter how good or bad the evaluation is, as long as wins by checkmate and material counting are assigned the same result score the engine will always strive for whichever can be achieved faster.

HGMuller commented 3 years ago

Sure. But what I proposed was this: