Closed catask closed 4 years ago
Thanks. Piece values in Fairy-SF are not as heavily tuned (since they need to be a good compromise for all variants) as in Seirawan-SF, so it is very possible that they might be slightly off. I will check the games when I find the time and probably test some patches to lower the piece values.
The games with massive misevaluations would be even more interesting, because they are hinting at bugs and can hence be tested/debugged much more directly. So if you still have them, or if you encounter them again, please report the FEN/PGN where this behavior could be observed and ideally reproduced.
[Event "?"]
[Site "?"]
[Date "2020.05.03"]
[Round "6"]
[White "FairySF"]
[Black "SeirawanSF"]
[Result "0-1"]
[GameDuration "00:25:31"]
[GameEndTime "2020-05-03T05:57:26.706 Central Daylight Time"]
[GameStartTime "2020-05-03T05:31:55.693 Central Daylight Time"]
[PlyCount "195"]
[Termination "adjudication"]
[TimeControl "300+5"]
[Variant "seirawan"]
1. e4 {book} e6 {book} 2. d4 {+1.05/24 23s} d5 {-0.19/24 11s}
3. e5 {+1.21/21 3.4s} c5 {-0.23/26 34s} 4. c3 {+1.21/20 4.5s} Ne7 {-0.18/25 24s}
5. Bd3 {+1.19/19 5.2s} Nbc6 {-0.16/26 16s} 6. Nf3/E {+1.37/23 32s}
c4 {-0.13/24 19s} 7. Bc2 {+1.40/21 4.3s} b5 {-0.10/21 2.9s}
8. Eh3 {+1.33/21 11s} b4 {-0.37/25 32s} 9. b3 {+1.71/22 17s} h6 {-0.37/24 11s}
10. O-O {+1.61/21 14s} Rb8 {-0.36/23 8.7s} 11. bxc4 {+1.88/18 2.8s}
dxc4 {-0.51/23 11s} 12. Bg5 {+1.83/24 23s} Qa5 {-0.62/28 44s}
13. Nbd2 {+2.16/23 15s} Qa6 {-0.95/28 29s} 14. cxb4 {+2.65/23 5.0s}
Nd5 {-0.67/23 7.3s} 15. a3 {+2.81/23 5.8s} Nc3 {-0.36/23 3.3s}
16. Qe1 {+2.98/21 6.0s} Bxb4/E {-0.40/25 7.8s} 17. axb4 {+10.17/24 4.6s}
Qxa1 {-0.40/24 5.3s} 18. Qxa1 {+13.00/23 4.8s} hxg5 {-0.50/25 12s}
19. Exg5 {+13.50/24 6.8s} Nb5 {-0.88/28 13s} 20. Nxc4 {+12.69/28 20s}
Eg8 {-0.36/25 14s} 21. Nd6+ {+13.12/27 13s} Nxd6 {-1.38/26 7.2s}
22. exd6 {+13.49/27 3.5s} Kf8/H {-1.51/27 11s} 23. b5 {+13.01/29 20s}
Nd8 {-1.53/30 28s} 24. Qa3 {+13.39/29 5.9s} Bd7 {-1.89/27 5.7s}
25. Ec5 {+13.23/30 15s} g6 {-1.53/28 6.7s} 26. Ne5 {+13.24/28 5.6s}
Kg7 {-1.35/26 6.6s} 27. Exd7 {+12.82/28 16s} Hxd7 {-1.89/30 7.4s}
28. Nxd7 {+13.06/26 4.9s} Rxb5 {-1.86/31 6.3s} 29. Be4 {+13.21/27 10.0s}
Ee8 {-0.92/23 3.2s} 30. Ne5 {+12.58/28 18s} Rh4 {-0.63/24 4.0s}
31. f3 {+13.25/25 5.8s} Eh8 {-0.36/23 3.3s} 32. Ng4 {+13.00/28 11s}
Rg5 {-0.08/25 2.9s} 33. Qc1 {+12.05/31 43s} Eh5 {0.00/27 3.6s}
34. Kf2 {+12.40/29 3.9s} Rd5 {-1.26/29 23s} 35. Bxd5 {+12.76/22 5.3s}
Exd5 {-1.14/28 6.8s} 36. Rd1 {+13.03/23 4.9s} Rh5 {-1.19/27 5.7s}
37. Qc7 {+13.16/25 5.7s} Eb4 {0.00/30 3.0s} 38. Rc1 {+10.57/33 28s}
Rd5 {0.00/33 4.4s} 39. Kg1 {+10.39/29 17s} Exd4 {0.00/37 3.3s}
40. Qc3 {+10.31/29 6.6s} e5 {0.00/36 3.6s} 41. Qe3 {+10.04/31 29s}
Ef5 {0.00/44 19s} 42. Qe1 {+9.74/29 15s} Exd6 {0.00/45 9.8s}
43. Nxe5 {+9.52/28 3.5s} Ee6 {0.00/44 9.7s} 44. Qc3 {+9.37/28 6.5s}
f6 {0.00/48 16s} 45. Ng4 {+9.24/26 8.7s} Rd1+ {0.00/47 5.5s}
46. Kf2 {+10.13/22 1.7s} Rxc1 {0.00/47 3.0s} 47. Qxc1 {+8.75/26 7.4s}
Nf7 {0.00/47 6.0s} 48. Qa1 {+8.46/30 6.8s} Ed6 {0.00/47 7.8s}
49. h4 {+8.46/34 4.4s} a5 {0.00/45 3.8s} 50. Kg1 {+8.46/34 1.8s}
Nd8 {0.00/47 3.7s} 51. Ne3 {+8.46/30 4.7s} Nc6 {0.00/41 3.7s}
52. Ng4 {+7.49/28 9.4s} Ef5 {0.00/38 5.7s} 53. Qc1 {+7.08/29 5.1s}
Ne5 {0.00/44 4.2s} 54. Qc7+ {+6.95/29 2.7s} Nf7 {0.00/42 2.6s}
55. Qc3 {+6.95/35 4.7s} Nd6 {0.00/42 8.3s} 56. Ne3 {+7.45/27 7.0s}
Ee5 {0.00/43 3.5s} 57. Kh2 {+7.21/27 5.6s} Nf7 {0.00/46 5.9s}
58. Nf1 {+7.22/27 2.1s} Eb5 {0.00/38 4.0s} 59. Qc4 {+7.14/28 5.5s}
Eb4 {+0.08/34 1.8s} 60. Qxb4 {+6.61/36 5.8s} axb4 {+0.08/43 2.9s}
61. Nd2 {+6.61/35 2.4s} Ne5 {+0.08/43 3.7s} 62. Kg3 {+6.49/38 3.2s}
Nd3 {+0.32/36 1.8s} 63. Kg4 {+5.71/37 7.7s} Kf7 {+0.08/42 12s}
64. f4 {+5.71/30 4.5s} f5+ {+0.40/35 2.3s} 65. Kf3 {+5.71/24 1.7s}
Nc5 {+0.08/45 11s} 66. Ke3 {+4.22/36 11s} b3 {+0.08/45 2.3s}
67. Nb1 {+3.52/33 5.9s} Ke7 {+0.60/36 5.1s} 68. g3 {+3.52/30 3.1s}
b2 {+0.08/38 3.5s} 69. Kd4 {+3.52/35 2.0s} Kd6 {+0.08/38 11s}
70. Na3 {+3.52/40 2.2s} Na4 {+0.08/39 3.3s} 71. Nc4+ {+3.52/37 7.0s}
Kd7 {+0.08/47 6.2s} 72. Nd2 {+3.41/45 8.6s} Kc6 {+0.08/49 1.6s}
73. Kd3 {+3.41/42 2.3s} Kb5 {+0.08/49 11s} 74. Kc2 {+3.52/35 6.7s}
Kb4 {+0.08/54 5.4s} 75. Nb1 {0.00/44 8.3s} Kc4 {+0.08/56 4.0s}
76. Nd2+ {0.00/45 3.4s} Kb4 {+0.08/47 4.3s} 77. Nb1 {0.00/46 2.5s}
Kc5 {+0.08/48 2.5s} 78. Kb3 {+3.41/38 2.3s} Nb6 {+0.08/48 9.3s}
79. Kxb2 {+2.04/33 10s} Kd4 {+0.08/50 5.0s} 80. Kb3 {+1.31/30 6.4s}
Nc4 {+0.08/44 4.5s} 81. Ka4 {+2.44/32 5.0s} Nd6 {+0.08/45 2.1s}
82. Na3 {+1.07/33 3.7s} Ke4 {+0.08/44 8.5s} 83. Ka5 {+0.62/33 6.3s}
Kf3 {+2.00/30 1.5s} 84. Kb6 {+0.31/34 5.0s} Kxg3 {+3.17/36 1.9s}
85. Kc6 {0.00/38 5.0s} Nf7 {+3.90/34 1.6s} 86. Kd5 {0.00/29 1.3s}
Kxf4 {+4.70/31 1.8s} 87. Ke6 {0.00/44 8.7s} Nh8 {+5.17/35 2.2s}
88. Nb1 {0.00/30 2.0s} Kg4 {+6.94/29 2.1s} 89. Ke5 {0.00/36 2.1s}
Kxh4 {+8.75/27 2.2s} 90. Kf4 {0.00/43 3.1s} Nf7 {+11.42/28 3.1s}
91. Nc3 {0.00/42 2.4s} Nh6 {+15.23/26 3.4s} 92. Nd5 {-4.07/29 13s}
g5+ {+23.20/26 3.1s} 93. Kf3 {-6.27/24 7.2s} Ng4 {+49.56/30 7.0s}
94. Ne7 {-6.19/24 5.0s} Ne5+ {+53.83/33 11s} 95. Ke3 {-8.37/23 5.0s}
Kg4 {+58.40/25 4.9s} 96. Nd5 {-8.70/22 5.0s} Kg3 {+59.04/24 2.2s}
97. Kd4 {-9.73/22 5.0s} Nf7 {+62.27/24 2.0s}
98. Ne7 {-10.90/22 5.0s, Black wins by adjudication} 0-1
You can see for instance move 18, the corresponding evals are +13.00 and -0.5, quite the difference. I haven't been able to reproduce this locally though. For the fen:
1rb1ke1r/p4pp1/2n1p3/1n2P1E1/1PpP4/5N2/2BN1PPP/Q4RK1[h] w kc - 0 1
Fairy-SF 10.4 gave me an eval of around +0.7. I'm not sure why there was such a huge difference in play. Now that I think of it, maybe it was from an older version of FairySF and could have already been fixed.
[Event "?"]
[Site "?"]
[Date "2020.05.03"]
[Round "9"]
[White "SeirawanSF"]
[Black "FairySF"]
[Result "1-0"]
[GameDuration "00:14:04"]
[GameEndTime "2020-05-03T08:03:27.581 Central Daylight Time"]
[GameStartTime "2020-05-03T07:49:22.867 Central Daylight Time"]
[PlyCount "76"]
[Termination "adjudication"]
[TimeControl "300+5"]
[Variant "seirawan"]
1. d4 {book} Nf6 {book} 2. c4 {+0.46/26 21s} d5 {-0.50/25 29s}
3. Nf3 {+0.60/25 12s} c6 {-0.47/21 3.5s} 4. Bf4/E {+0.51/23 5.9s}
Bf5/H {-0.68/23 9.2s} 5. e3 {+0.53/28 17s} e6 {-0.55/23 15s}
6. c5 {+0.60/24 3.6s} Be7 {-0.52/21 3.1s} 7. Nc3 {+0.59/25 6.8s}
O-O {-0.66/21 5.3s} 8. a3 {+0.63/27 17s} Nbd7 {-0.63/25 7.6s}
9. h3 {+0.79/24 6.3s} Ne4 {-0.68/25 8.2s} 10. Bd3/H {+0.79/26 7.9s}
a5 {-0.89/26 33s} 11. Na4 {+1.08/24 8.8s} Bh4 {-0.61/24 6.7s}
12. g3 {+0.87/29 39s} Be7 {-1.00/28 45s} 13. g4 {+0.76/28 12s}
Bg6 {-0.84/23 3.1s} 14. Hh2 {+0.82/28 5.5s} f6 {-0.60/26 13s}
15. Bc7 {+1.13/24 5.4s} Qe8 {-0.49/24 3.8s} 16. Nh4 {+1.05/25 16s}
f5 {-1.36/29 43s} 17. Nxg6 {+1.25/24 2.2s} Qxg6 {-1.55/24 5.2s}
18. Ee2 {+1.70/26 45s} Ha7 {-1.71/24 40s} 19. Rg1 {+2.02/24 13s}
Bh4 {+0.14/22 2.6s} 20. gxf5 {+2.65/23 4.2s} Qxf5 {-1.11/28 14s}
21. Hf4 {+2.65/27 9.7s} Kh8 {+1.31/25 3.6s} 22. Nb6 {+2.84/29 12s}
e5 {+2.36/23 4.6s} 23. Bxe4 {+8.12/25 7.0s} dxe4 {+0.74/26 15s}
24. Nxa8 {+8.41/26 2.9s} exf4 {+3.29/26 5.0s} 25. Exf4 {+8.58/26 3.3s}
Bxf2+ {+3.85/26 13s} 26. Kxf2 {+8.88/28 4.1s} Qe6 {+0.64/27 30s}
27. Exf8+ {+9.37/26 4.6s} Nxf8 {0.00/28 11s} 28. Qg4 {+9.70/25 5.3s}
Qxg4 {0.00/28 3.3s} 29. Rxg4 {+11.28/25 4.0s} Hb5 {0.00/31 3.9s}
30. Rxe4 {+12.07/25 3.7s} Hd3+ {0.00/31 3.9s} 31. Kf3 {+12.64/26 4.3s}
Kg8 {0.00/32 4.1s} 32. Bd6 {+14.55/26 6.1s} Ng6 {0.00/31 3.6s}
33. Nc7 {+16.11/26 11s} Hc4 {0.00/33 4.0s} 34. Re8+ {+19.36/25 5.8s}
Kf7 {-4.59/26 16s} 35. Ke4 {+21.20/26 6.6s} b5 {-10.98/28 34s}
36. Rg1 {+23.17/25 6.2s} Hd2+ {-13.74/27 13s} 37. Kd3 {+24.58/26 4.6s}
Hc4+ {-13.99/26 12s} 38. Kc2 {+28.75/28 13s}
Kf6 {-14.19/25 9.8s, White wins by adjudication} 1-0
I haven't tested this game yet but it was quite strange play from Fairy-SF.
Thanks, I will check the games. Edit: I had a quick look and in both games it obviously does not consider that it can not gate a piece any more. This is known, since initially the introduction of a penalty for not being able to gate a piece failed to give an improvement (perhaps because it happens too rarely), but I can test such ideas again. Edit2: I noticed a sign error in the patch I tested back then, so maybe the fixed version will pass.
For reference, I found the tuning result http://www.variantfishtest.org:6543/tests/view/5dc09f166e23db1ffe4a26e9 from end of last year again that also pointed at a slight reduction of hawk and elephant value, but the subsequent test failed, so I will now test the piece value changes separately.
The piece values tweaks failed in testing, see http://www.variantfishtest.org:6543/tests/view/5ecd5e2f6e23db36d55f2c43 and http://www.variantfishtest.org:6543/tests/view/5ecd5eaa6e23db36d55f2c46.
Surprisingly also the penalty for ungateable pieces failed http://www.variantfishtest.org:6543/tests/view/5ecd66b56e23db36d55f2c49, although this should normally be a no-brainer. Either I have a bug in the implementation, or this situation simply does not occur often enough to be relevant.
Hi The penalty for ungatable pieces is not relevant in Seirawan Chess. The only case this penalty can be relevant is when one of the sides forgets to gate his piece and can no longer do it (it's as if he lost a piece) And in that case there is no need for a Penalty as an Elephant (Chancellor) or Hawk (archbishop) loss is Worth almost 800 centipawns or more and that's a sufficient handicap that doesn't need to be further refined.
This penalty could be tested in Variants like Musketeer Chess were the pieces are gatable in predefined squares and the choice of the gate is important in many situations.
Zied
Le ven. 29 mai 2020 à 10:19, Fabian Fichter notifications@github.com a écrit :
The piece values tweaks failed in testing, see http://www.variantfishtest.org:6543/tests/view/5ecd5e2f6e23db36d55f2c43 and http://www.variantfishtest.org:6543/tests/view/5ecd5eaa6e23db36d55f2c46.
Surprisingly also the penalty for ungateable pieces failed http://www.variantfishtest.org:6543/tests/view/5ecd66b56e23db36d55f2c49, although this should normally be a no-brainer. Either I have a bug in the implementation, or this situation simply does not occur often enough to be relevant.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ianfab/Fairy-Stockfish/issues/135#issuecomment-635840950, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIE4HP3KG5SLAMWGCJZAWTRT5VZNANCNFSM4NJYUEAA .
You misunderstood, it does not consider yet that pieces that do not have any gates left are lost (i.e., it might evaluate a position as +10 because it has an elephant in hand that can no longer be gated), only after the patch I tested it does, but even that one does not pass.
I understand now. It's Indeed in my humble opinion a useless patch. The situations in which someone loses the right to gate a piece is extremely rare. That's why the results are not significant. In the other hand, the idea of evaluating "handicap" related to having a piece that isn't yet gated is much more difficult than it seems.
For example, it dépends if the piece can make leaps in a closed position were it is surrounded by it's own pieces escaping these closed positions will help restore a better coordination between they classic pieces and clearly with such pieces the sooner you gate them the better you can improve your position (because these pieces are also good attackers). In the other hand, pieces that can be blocked or with limited leaping moves forward are probably to be gated in positions were the maximum number of classic pieces have already been developed (example of the cannon in musketeer chess were it is better gated in the squares behind the king or a rook king side. When gated there, it is a useful piece as a defender, it's also a Deadly piece in the endgames as it can mate a bare king alone and acts like a roller coaster when attacking the oppenent king in close range combat !
Le ven. 29 mai 2020 à 11:50, Fabian Fichter notifications@github.com a écrit :
You misunderstood, it does not consider yet that pieces that do not have any gates left are lost (i.e., it might evaluate a position as +10 because it has an elephant in hand that can no longer be gated), only after the patch I tested it does, but even that one does not pass.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ianfab/Fairy-Stockfish/issues/135#issuecomment-635883028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIE4HKZST6LORWMCYA5LADRT6ANPANCNFSM4NJYUEAA .
Thanks ubdip for testing the patch. Well, I'm still not sure what to think about it. S-chess is not so highly played game as of yet, so probably need more info to make judgements :).
Hi I think the problem is the following: Seirawan Chess is a game where the additional pieces are dropped in vacant squares when a piece in the 1st or 8th row moves first. The right question that must be asked is as follows: When an additional piece is dropped, apart from bringing a lot of power in the board, what does it bring to the position?
Sometimes dropping such pieces early in the game create more problems as the board becomes overcrowded and the piece coordination can find itself hampered by this crowded board. So when introducing such pieces the question is: does it create immediate threats? Does it allow a capture (or even a mate) protecting the piece that left the gate and allowed the new piece to be brought to the game?
Le sam. 30 mai 2020 à 23:58, catask notifications@github.com a écrit :
Thanks ubdip for testing the patch. Well, I'm still not sure what to think about it. S-chess is not so highly played game as of yet, so probably need more info to make judgements :).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ianfab/Fairy-Stockfish/issues/135#issuecomment-636390501, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIE4HKOUFD4LB37G4AEZXTRUF6RZANCNFSM4NJYUEAA .
Since the piece value tweak failed and the evaluation issue with ungateable pieces is also recorded as issue #160, I will close this for now. Test submissions on fishtest with patches to address these imbalance evaluation inaccuracies are of course still very welcome.
@ianfab ; This should gain some Elo. I modified your test patch. The pieces are not weighed with the share of their full value. Especially in the first game phases the weight is lowered to support tactical play. Tested locally with 3s+0.03s and 30s+0.3s. I forgot my password at variant-fishtest.
The code could be improved to reflect the individual piece values with weight factors for midgame and endgame and then be tuned.
diff --git a/src/evaluate.cpp b/src/evaluate.cpp
index ec4af341..72296359 100644
--- a/src/evaluate.cpp
+++ b/src/evaluate.cpp
@@ -467,6 +467,10 @@ namespace {
std::max(PieceValue[EG][pos.promoted_piece_type(pt)] - PieceValue[EG][pt], VALUE_ZERO)) / 4 * pos.count_in_hand(Us, pt);
if (pos.enclosing_drop())
mobility[Us] += make_score(500, 500) * popcount(b);
+
+ // Reduce score if there is a deficit of gates
+ if (pos.seirawan_gating() && !pos.piece_drops() && pos.count_in_hand(Us, ALL_PIECES) > popcount(pos.gates(Us)))
+ score -= make_score(200, 900) / pos.count_in_hand(Us, ALL_PIECES) * (pos.count_in_hand(Us, ALL_PIECES) - popcount(pos.gates(Us)));
}
return score;
Found improvement at 3s+0.03s (and Black has the advantage with the seirawan.epd book):
cutechess-cli-1.2.0 -variant seirawan -engine cmd=./stockfish-gat1d proto=uci -engine cmd=./stockfish-7b8b83e proto=uci -games 2 -rounds 500 -repeat -recover -concurrency 2 -each tc=3+.03 -openings file=ianfab/seirawan-ianfab.epd format=epd order=random -pgnout FairySF-gat1-3s-A01f.pgn -epdout positions-A01f.fen
...
Score of Fairy-Stockfish 040920 LB 64 BMI2 vs Fairy-Stockfish 040920 LB 64 BMI2: 477 - 387 - 136 [0.545] 1000
... Fairy-Stockfish 040920 LB 64 BMI2 playing White: 225 - 208 - 67 [0.517] 500
... Fairy-Stockfish 040920 LB 64 BMI2 playing Black: 252 - 179 - 69 [0.573] 500
... White vs Black: 404 - 460 - 136 [0.472] 1000
Elo difference: 31.4 +/- 20.1, LOS: 99.9 %, DrawRatio: 13.6 %
Result for 30s+0.3s:
cutechess-cli-1.2.0 -variant seirawan -engine cmd=./stockfish-gat1d proto=uci -engine cmd=./stockfish-7b8b83e proto=uci -games 2 -rounds 500 -repeat -recover -concurrency 2 -each tc=30+.3 -openings file=ianfab/seirawan-ianfab.epd format=epd order=random -pgnout FairySF-gat1-30s-A01g.pgn -epdout positions-A01g.fen
...
Score of Fairy-Stockfish 040920 LB 64 BMI2 vs Fairy-Stockfish 040920 LB 64 BMI2: 388 - 341 - 271 [0.523] 1000
... Fairy-Stockfish 040920 LB 64 BMI2 playing White: 163 - 199 - 138 [0.464] 500
... Fairy-Stockfish 040920 LB 64 BMI2 playing Black: 225 - 142 - 133 [0.583] 500
... White vs Black: 305 - 424 - 271 [0.441] 1000
Elo difference: 16.3 +/- 18.4, LOS: 95.9 %, DrawRatio: 27.1 %
@alwey Thanks, sounds promising. Feel free to create a new account on fishtest in order to test it there (I could also reset the password and send the new password to the email address attached to your account, but since there is no "change password" functionality for users, you would be stuck with that password. Or I could send you the existing password, which would however imply me seeing it.).
@ianfab: Thank you! I received my password. The 30s+0.3s results should be available in about 2 hours (13:10 CEST).
Running some games between seirawan-sf and fairy-sf, it seems that fairy-sf often overevaluates the hawk/elephant and trades it (disadvantageously) for two pieces or sometimes two pieces and a pawn. Almost every time the pieces ended up winning, I am not sure if this is because pieces are objectively stronger or because seirawan-sf is stronger.
Hardware: 28cores/16gb Hash
Example games:
That was 5 losses from a set of 36 games, so I wouldn't say it is an uncommon issue or way to get outplayed. I didn't check the whole set due to some pgn-parser issues with winboard so there might be a few more. I'm still not sure what the correct evaluation of those positions are, but it was a bit surprising to see fairy-sf losing in a similar style so many times.
Another game which was surprising to see was Fairy-SF evaluating a knight and 3 pawns vs knight and 3 pawns as +2 (when it clearly had no chance of winning) it later lost that endgame, or queen vs elephant as +13 (!) when seirawan-sf had the position as approximately equal. In another game it simply didn't gate its elephant at all playing the whole game down an elephant :D. Not sure if these were some glitches in cutechess or not.