Open bkestelman opened 1 year ago
Another option is to skip representing moves entirely and just specify the state after a move. Some pros & cons:
Pros:
Cons:
I'd like to try both approaches. The state-only approach actually sounds intuitively simpler to me, but if we want to start with the tried and true state-move pair that's fine too.
Let's start with the traditional state-move pair approach. See my example implementation in TicTacToe: 2d646a9f92dd93cb74235b1bc882e5d716d65d54
Need to choose a way to represent moves and check if they're legal.
Should take into account that we may frequently want to list all possible moves for a position (so we can assign probabilities to each move).
The Alphazero paper (p13) represents a move as the initial position of a piece, followed by a one-hot vector of 73 possible relative moves from the initial square (i.e. queen moves or knight moves). The other option is just to use the final absolute position instead of a relative movement. That would have the advantage of using only 64 values instead of 73 but may require slightly more work to check if the move is legal.