Xiangqi perft speed question

maksimKorzh commented 3 years ago

@ianfab Hi Fabien

I was trying to compare the speed of passing a perft test for xiangqi between fairy stockfish and my engine wukong. My engine is written in pure javascript and runs via nodejs but the perft speed seems to be about equal to fairy stockfish written in C++, how on earth is that possible?

Unfortunately I can't provide anything specific since in fairy stockfish the time spent for perft is not printed anfter executing: go perft 5

But here's output of my engine:

` Wukong Xiangqi - UCI mode - v1.0

position startpos

9 r n b a k a b n r 8 . . . . . . . . . 7 . c . . . . . c . 6 p . p . p . p . p 5 . . . . . . . . . 4 . . . . . . . . . 3 P . P . P . P . P 2 . C . . . . . C . 1 . . . . . . . . . 0 R N B A K A B N R

a b c d e f g h i

side: r sixty: 0 hash key: -1866463866 king squares: [e0, e9]

perft 5 Performance test:

move 1: a3a4 nodes: 3449924 move 2: c3c4 nodes: 3272375 move 3: e3e4 nodes: 3271869 move 4: g3g4 nodes: 3272375 move 5: i3i4 nodes: 3449924 move 6: b2a2 nodes: 2624742 move 7: b2c2 nodes: 2708656 move 8: b2d2 nodes: 3426077 move 9: b2e2 nodes: 2379130 move 10: b2f2 nodes: 3203353 move 11: b2g2 nodes: 2322021 move 12: b2b3 nodes: 2686053 move 13: b2b4 nodes: 3540406 move 14: b2b5 nodes: 3360869 move 15: b2b6 nodes: 2505848 move 16: b2b9 nodes: 2224970 move 17: b2b1 nodes: 3549371 move 18: h2g2 nodes: 2708656 move 19: h2f2 nodes: 3426077 move 20: h2e2 nodes: 2379130 move 21: h2d2 nodes: 3203353 move 22: h2c2 nodes: 2322021 move 23: h2i2 nodes: 2624742 move 24: h2h3 nodes: 2686053 move 25: h2h4 nodes: 3540406 move 26: h2h5 nodes: 3360869 move 27: h2h6 nodes: 2505848 move 28: h2h9 nodes: 2224970 move 29: h2h1 nodes: 3549371 move 30: a0a1 nodes: 4549668 move 31: a0a2 nodes: 2965482 move 32: b0a2 nodes: 2931107 move 33: b0c2 nodes: 2743590 move 34: c0a2 nodes: 2944727 move 35: c0e2 nodes: 2641550 move 36: d0e1 nodes: 3288025 move 37: e0e1 nodes: 3405238 move 38: f0e1 nodes: 3288025 move 39: g0e2 nodes: 2641550 move 40: g0i2 nodes: 2944727 move 41: h0g2 nodes: 2743590 move 42: h0i2 nodes: 2931107 move 43: i0i1 nodes: 4549668 move 44: i0i2 nodes: 2965482

Depth: 5 Nodes: 133312995 Time: 54453 ms `

Note the spent time in ms. The time spent by fairy stockfish for this position is about equal (at least visually) The number of nodes traversed is ovbiously the same.

I'm not criticizing anything at any point but feel truly confused by this fact.

So next I decided to run perft test for classical chess variant and that surprised me even more - it's times and times slower compared to stockfish. Classical stockfish goes perft depth 6 on my laptop almost instantly, while fairy stockfish spends the time which is close to what my js engine for classical chess does.

Could you please kindly explain the reason behind this significant slowdown?

I'm going to try to port my js array based xiangqi move generator to C++ and try to connect it to stockfish to see how well it would play and at the moment I'm trying to figure out how likely it might be making some sense. I tried bitboards for classical chess in the past successfully but really want to avoid them for xiangqi because of magic bitboards for cannons and similar complications.

P.S. sorry for not labeling the issue, it should be "xiangqi" and "question" but I don't know how to do that.

ianfab commented 3 years ago

The main reason for this slowdown is the overhead due to the generalizations for the support of many piece types and rules. Specialized engines for specific games contain optimizations that rely on particular assumptions about the rules of the game, e.g., in the case of Stockfish the code assumes that all non-pawn pieces have symmetrical moves, i.e., that a piece moving from A to B can also move from B to A, which is broken by piece types like the horse.

To not overly complicate the code with a lot of special cases, I removed several of such optimizations, and at the same time needed to add additional logic for special rules of some variants. This significantly slows down the move generation, and particularly the move validation, since the checking for attacks on the king is very expensive when not being able to exploit specific assumptions about the rules.

This is a known issue, but I would like to mitigate it, see https://github.com/ianfab/Fairy-Stockfish/issues/253, and already implemented some improvements recently for games with very moderate changes compared to chess. Since the profiling and optimization is very time intensive, I unfortunately can not work that much on this topic. Contributions on this are of course very welcome.

These slowdowns cause a performance loss of around 100-200 Elo for most variants compared to a specialized implementation. Although this is big, the playing strength that can be gained by evaluation and search improvements usually are much bigger, and in contrast to the speed improvements do not have an inherent limit, which is why I accept this drawback at the benefit of being able to use the strong search and evaluation for a large number of variants. However, I am of course open for suggestions how to improve this.

If you want to time the perft results, you can e.g. use the bench command for that, such as: ./stockfish bench xiangqi 16 1 4 default perft

maksimKorzh commented 3 years ago

@ianfab Thank you so much Fabien Your answer makes perfect sense. Thank you for bringing the light)

fairy-stockfish / Fairy-Stockfish

Xiangqi perft speed question #278