SebLague / Chess-Challenge

Create your own tiny chess bot!
https://www.youtube.com/watch?v=Ne40a5LkK6A
MIT License
1.78k stars 1.06k forks source link

[CRITICAL PERFORMANCE ISSUE] Move List Allocation #188

Closed TheBlackPlague closed 1 year ago

TheBlackPlague commented 1 year ago

Chess Programming Background

I'm the lead developer of the C# StockNemo Chess Engine (~ 3.2K elo & TCEC Swiss Participant), as well as the lead developer of the (in-development) C++ rewrite StockDory (> 3.3K elo & TCEC Swiss Participant). Neither of those engines is a derivative or copy of Stockfish (Stockfish was an inspiration for the name).

Summary

As was expressed through Discord, Move Generation is far too slow with the current state of the API. Among other things, it does a heap allocation (for 218 moves) per traversed node in the tree. While other things can also be discussed, fixes for them aren't as simple. On the contrary, for the mentioned problem, the fix is simple: Use C#'s stackalloc feature to allocate move lists on the stack (considerably faster allocation & doesn't trigger the Garbage Collector). The code for the majority of developers would change...

From:

Move[] moves = board.GetLegalMoves();

To:

Span<Move> moves = stackalloc Move[218];
board.GetLegalMoves(ref moves);

Now, GetLegalMoves would accept a Span<Move> reference argument, inflating that stack-allocated span instead of allocating an array on the heap. As can be seen, this adds near zero complexity and is not hard to understand for most developers. It is understood that this project was aimed towards the novice side of the chess engine developing community; however, it's important that correct practices are pushed forward rather than incorrect misconceptions.

Not to mention, the speed-up due to this would be quite insane. See PR: https://github.com/SebLague/Chess-Challenge/pull/190 (includes benchmarks)

It is a problem, not just a fair disadvantage:

So far, though, I've talked about this morally. However, it's important to back it up with cases where it is a real problem:

Consider the FEN:

r2q1rk1/bppb1pp1/p2p2np/2PPp3/1P2P1n1/P3BN2/2Q1BPPP/RN3RK1 w - - 2 15

The above FEN is a position where, after 1 move, a certain phenomenon is possible: Capture Train. It's the case where a series of noisy positions (positions with capture moves available) come one after another. In simpler words, a position is reached where the opponent or we can capture a piece, and then the other side can also capture a piece the next turn, and so forth, for numerous moves (without a gap in between). Eventually, a quiet position (where no more captures are possible) will be reached, where no more capture moves are available for that position. The above FEN, after the 1 move, has a capture train which is at least 17 ply deep.

Oh, cool. But how is that relevant?

A popular search feature that almost every strong chess engine (pretty much every strong alpha-beta chess engine) has is called Quiescence Search (QSearch). It deals with the Horizon Effect, which Sebastian discussed in one of his videos (if my memory serves me right).

A capture train is also popularly regarded as a QSearch Explosion, where QSearch goes extremely deep to ensure it misses nothing over the horizon. With the current move generation speed (heap allocation & everything), even a Depth 1 Alpha-beta Search, which uses QSearch to handle the horizon effect, does not finish within 800ms (on my i9-11900H).

Surely, this is a serious issue and should be resolved.

To add: image

Over 16GB heap allocation in just a single game.

w1wwwwww commented 1 year ago

see #109

AugsEU commented 1 year ago

Is the challenge not to work within the confines of the program, flawed as it may be? Let's be honest, none of the entries are going to beat stockfish so there's no demand for this to be as performant as possible. I'd be weary of a major change like this now because people have already started their projects.

TheBlackPlague commented 1 year ago

see #109

I'm the one that brought up the issue in the first place, Pedro decided to create an issue for it. At the moment, people believed this wasn't an issue. This issue shows that it is.

TheBlackPlague commented 1 year ago

Is the challenge not to work within the confines of the program, flawed as it may be? Let's be honest, none of the entries are going to beat stockfish so there's no demand for this to be as performant as possible. I'd be weary of a major change like this now because people have already started their projects.

I completely agree that beating Stockfish isn't the purpose. But as you can see, it can't handle a QSearch Explosion. That should be the bare minimum in my honest opinion.

AugsEU commented 1 year ago

It's only an "issue" if you are trying to make a "serious" engine, but that's not possible with just 1024 tokens anyway. If the base program has issues then I think it's up to you to work around them entirely within the confines of MyBot.cs; that is the challenge.

I feel that if we start digging into flaws and optimisations of the base program like this, it will never end and participants will have to deal with endless changes. You could even argue that making this in C# is a "serious issue" and that the whole thing needs to be made in C++

I think it's best if we just stick with the current base program, warts and all(unless it's a UI bug or something like that).

TheBlackPlague commented 1 year ago

It's only an "issue" if you are trying to make a "serious" engine, but that's not possible with just 1024 tokens anyway. If the base program has issues then I think it's up to you to work around them entirely within the confines of MyBot.cs; that is the challenge.

I feel that if we start digging into flaws and optimisations of the base program like this, it will never end and participants will have to deal with endless changes. You could even argue that making this in C# is a "serious issue" and that the whole thing needs to be made in C++

I think it's best if we just stick with the current base program, warts and all(unless it's a UI bug or something like that).

I cannot, in good faith, agree with you on this. I've expressed why this is a serious problem, made a PR to fix it, and hopefully Seb accepts it.

Eldriitch commented 1 year ago

It's only an "issue" if you are trying to make a "serious" engine, but that's not possible with just 1024 tokens anyway. If the base program has issues then I think it's up to you to work around them entirely within the confines of MyBot.cs; that is the challenge.

I feel that if we start digging into flaws and optimisations of the base program like this, it will never end and participants will have to deal with endless changes. You could even argue that making this in C# is a "serious issue" and that the whole thing needs to be made in C++

I think it's best if we just stick with the current base program, warts and all(unless it's a UI bug or something like that).

I don't entirely agree. This issue doesn't present an interesting challenge to the programmer, it just makes some reasonable approaches unviable for reasons outside their control. If we had the space to implement this stuff ourselves it wouldn't be so much of a problem, but we simply don't.

AugsEU commented 1 year ago

Imo if an approach doesn't work then use a different one. As they say in fighting games:

~buff~ ~nerf~ ~patch~ adapt

Eldriitch commented 1 year ago

Imo if an approach doesn't work then use a different one. As they say in fighting games:

~buff~ ~nerf~ ~patch~ adapt

If the point were pure competition it wouldn't matter so much, but the goal here is creativity and diversity in submissions. alpha-beta with quiescence is a fundamental method on which you can build lots of very interesting things.

And frankly, nobody loses out by this being fixed. If you strategy doesn't involve quiescence and similar tree-search things, your code isn't going to get worse.

mcthouacbb commented 1 year ago

@TheBlackPlague Can you provide a list of search optimizations applied in the 16 GB quiescence search mentioned?

TheBlackPlague commented 1 year ago

@TheBlackPlague Can you provide a list of search optimizations applied in the 16 GB quiescence search mentioned?

Due to the competition being, well, competitive, I cannot. I shared everything that would be relevant and that I could share (while still maintaining a competitive edge). The PR fixes the issue, preventing this from happening, so there's that.

SebLague commented 1 year ago

I have added board.GetLegalMovesNonAlloc as an alternate way to get moves in V1.13 (see docs for more info).

JoelvdBrink commented 1 year ago

Hi, Im just getting started with C# so Im quite new to all this, but I do also experience the problems with the Quicense search taking a lot more time than it should, resulting in me having to write an extra depth evaluation just for the Quicense search as a ducktape fix for now. I understand how this problem is created, but what is stack allocation and how does it solve the search explosion? I am curious to learn how it works, because I don't think it is as simple as replacing...

Move[] moves = board.GetLegalMoves();

with

Span moves = stackalloc Move[218]; board.GetLegalMoves(ref moves);

I tried Googling the answer but that didn't get me far