Custom loss function - Githubissues

According to the original paper, the loss function used is a combination of MSE for the Q value and cross-entropy for the move probability distribution, together with L2 regularization on the parameter vector theta. They have a c parameter for controlling the regularization but the MSE/CE losses are weighed the same.

We should probably start with something like this, maybe with something to control the ratio between MSE/CE?

AlphaZeroIncubator / AlphaZero

Custom loss function #1