How to get started with NNUE

Hello, I apologize for the delay in my response, but I will be happy to help you ! Since I am unfamiliar with your background, it's difficult to know how I can help you. If my answer is not what you are expecting, please provide me further information on your difficulties.

NNUE stands for "Efficiently Updatable Neural Network". Forget a moment the "Efficiently Updatable" part, it's just a standard neural network with multiple fully connected layers ! The first layer has to deal with a (special) representation of the chess board, whereas the last one produces an evaluation of the position.

First layer of the Neural Network

Traditionnally, in machine learning, a simple way to improve training speed and convergence is to transform your inputs in a sequence of '0' and '1' (i.e. one-hot encoding of the input features). This guarantees the domain of inputs (between 0 and 1), and allows the model to learn the weights associated to each one-hot encoded feature.

In chess, a board position can be represented in such a sequence : there are 6 types of piece, 2 colors, 64 squares = 6 2 64 = 768 possible combinations. A more advanced one-hot encoding scheme is to index the position of all pieces (5 2 64 = 640) for each position of the king (64), which leads to 640 * 64 = 40960 possible combinations. You can then add other features, always represented by a '0' or a '1', like for example "side to move", etc.

Another common network architecture in chess is to have a sort of 'Y' for the inputs (i.e. 2 "legs") : one sequence of '0' and '1' is dedicated to the white's point of view, one other sequence to the black's point of view (you can also decide to have one sequence for the player having the trait, and the other for its opponent). Both sequences are then concatenated (i.e. 768 + 768 inputs = 1536 inputs) before being connected to the first hidden layer.

Illustration (source: https://www.chessprogramming.org/Stockfish_NNUE) :

768px-StockfishNNUELayers

Other layers of the Neural Network

The rest of the network is classic. The main challenges are here to try to find the best architecture (number of layers, number of neurons by layer), and decide if you introduce things like non-linearity between layers (e.g. see ReLU) clipping of weights and/or layer outputs, quantization of weights, etc. Start simple and then iterate !

The "Efficiently Updatable" part

This concept has been introduced to speed up the inference time (i.e. when you actually evaluate a position with the network within your engine). Imagine that you have to feed the network with a (one hot encoded) chess position. Each '0' and '1' will be multiplied by one specific weight and the sum (*) will be computed to get the input to the first neuron of the first hidden layer. Then, you'll have to do the same for the second neuron of the first hidden layer, and so on... That's a lot... And you have to do this for each position encountered in your tree search...

The chance in chess is that applying a move on the chessboard is basically moving a piece (sometimes eating another piece, castling, etc. but let's say only moving a piece), which leads to compute 1) an index to remove the '1' in the sequence of 768 '0' and '1', corresponding to the fact that the piece is leaving its position, and 2) another index to add a '1' corresponding to the fact that the piece is moving to that position. This leads (to compute the input of the first neuron of the first hidden layer) to subtract the weight associated to the first index and to add the weight associated to the second index from/to the sum marked above with a (*). You don't have to recompute all the '1' with their associated weights for the other pieces ! To do so, you just have to maintain in an array the current sum associated to each neuron of the first hidden layer (sometimes, this array is called the "accumulator").

Links

Here are some pointers that I have in mind, and that can also help you :

Kind regards, David

david-carteau / cerebrum