AlphaZeroIncubator / AlphaZero

Our implementation of AlphaZero for simple games such as Tic-Tac-Toe and Connect4.
0 stars 0 forks source link

Custom loss function #1

Closed guidopetri closed 4 years ago

guidopetri commented 4 years ago

According to the original paper, the loss function used is a combination of MSE for the Q value and cross-entropy for the move probability distribution, together with L2 regularization on the parameter vector theta. They have a c parameter for controlling the regularization but the MSE/CE losses are weighed the same.

We should probably start with something like this, maybe with something to control the ratio between MSE/CE?

guidopetri commented 4 years ago

Closed by #22 .