How does chess MCTS work?

yhyu13 commented 7 years ago

@cooijmanstim

Hi, I am python programmer who started a alphaGo Zero replication project, I would like to practice a similar design to learn chess from scratch.

In Go, the MCTS expand to 19x19+1(pass) children nodes, and the role shift to the opponent side (converge to a min-max tree eventually). The average legal move of Go at any moment is about 150-200. I've heard that there are over 6,000 legal moves in chess, but it's pretty hard to think about a search tree that expands to 6,000 children nodes.

Unfortunately, I'm not a Chess player. Forgive my ignorance, I want to the high-level of your MCTS.

cooijmanstim commented 7 years ago

Hi @yhyu13, as you might expect, MCTS for chess works quite similarly to MCTS for Go. The search space of chess should be much less complicated than that of Go, with about 40 moves per ply on average. Another difference is that chess is a very precise game -- small differences between two configurations can lead to wildly different outcomes -- and Minimax search is considered more suitable for such games than MCTS. Nevertheless it would be interesting to see how a neural-net-equipped MCTS agent like AlphaGo would do at chess.

My MCTS implementation builds a hash table that maps game states to child states and parent states, with meta information for each node such as win rate and number of times explored. This is basically equivalent to an MCTS tree, except it's an MCTS graph, where each node can have multiple parents and cycles are possible. Also, by using a hash table, the memory use is constant. This is nice because my main plan with this was to persist the hash table in order to "learn" between games, by keeping meta-information from every rollout ever explored. This doesn't seem to be very effective though, possibly due to hash collisions. One thing I wanted to explore but haven't is the use of semantic hashing to get a better distribution of hashes, and more sensible hash collisions.

The MCTS logic is in mcts.cpp. MCTSAgent from mcts_agent.cpp implements a turn-based agent that calls upon mcts.cpp to sample, ponder and make decisions.

yhyu13 commented 7 years ago

@cooijmanstim

Thanks for explaining your work very well!

The neural network requires a constant dimension of outputs, I guess this is what make alphaGo approach hard in playing chess. The action space of GO is 362 (19^19+pass), but the action space of chess is 64^(16 chess pieces) (without knowing the rule of chess), as the alphaGo approach is to learn from scratch. That was my initial impression on using neural net in Chess. But I was wrong.

So in order for a neural network to incorporate with a search tree, the neural net can predicts (1) what Chess piece to move at current state (2) where should it go. Thus, the action space reduces to 16x1 concatenate with 64x1, totally a 80x1 vector. But in Chess, you can't tell difference between two knights, or two castles. So this idea is not so good as well.

The last idea to expand to all possible Chess moves and prune all illegal moves on the fly. And pad zeros when generating data for the neural net.

So they are my main troubles in training a neural net that learn chess from self-play. But the good news is that Chess lasts long about 30 moves (in comparison to 200 moves in Go). So the tree of Chess must be relatively shorter and thinner than Go.

I'm curious how many searches you will make per move? DeepMind uses 1600 searches, and my experiment shows (depends on implementation), the actual expanded leaf nodes are about 800~1400 per move in the game of Go

cooijmanstim commented 7 years ago

I honestly don't remember how many samples I get before making a move. Without smart rollout policies and early evaluation (both of which DeepMind uses neural nets for), 1600 wouldn't be nearly enough. I would expect to need millions of samples.

You raise a good point about the action space being hard to represent because it changes drastically from one configuration to the next. Whereas in Go, each move consists of putting down a stone in a given location, in Chess there are several kinds of pieces with different behaviors, and different actions you can take with them (e.g. pawns can push or double-push or capture, kings can move or capture or castle). What you could do is let the model output a probability distribution over source squares (i.e. where it wants to move from) and target squares (i.e. where it wants to move to). The latter distribution would have to be conditioned on the former, as the distribution over target squares will look very different depending on which source square was chosen.

Knowing both the source and target squares leaves very little ambiguity in what kind of move is being performed. It disambiguates pawn push vs double-push vs capture, king move vs capture vs castle, etc. The only remaining disambiguity is in pawn promotion, where the player has the choice to promote the pawn to a knight, bishop, rook or queen. However in the vast majority of cases the choice is queen, and a Chess engine that can only promote to queen should do pretty well, all else being equal.

The fact that the position of all the pieces on the chess board doesn't capture the entire game state is interesting/annoying as well. The extra information you would need to feed into the network are the en-passant square (if any) and castling rights (if any).

cooijmanstim / mc-chess

How does chess MCTS work? #2