Closed TheodoreEhrenborg closed 5 years ago
Addressed (4) by adding dropouts of 0.5
Addressed (3)
Addressed (2) by adding an extra option to do_training. Now I just hope these fixes work.
I am currently running Neural_Nash.do_training() with stages of learning as in (2)
Neural_Nash is doing fine.
The problem is that Neural_Nash has already learned that the best way to be consistent is to assume that Player 4 (or maybe it's 3) will always win. Thus the new information -- players can win if they have a lot of points -- isn't learned very well. Maybe it's even being completely swamped by the simpler heuristic: Player 4 will win.
Here's how to address this problem: (1) When doing the training, I should wipe the neural network once beforehand just to completely eliminate the idea that Player 4 is any good. (2) It will first spend time learning how to evaluate positions with one token left. Then two tokens, etc., slowly working its way up to full games. This means that I don't have to waste time running aux_stochastic when Neural_Nash has to make a move in a game with too many tokens. (3) Although Neural_Nash does not need to be able to know how to predict moves in games with zero tokens, I should get the Game class to save those positions just because Neural_Nash needs every hint possible on how to learn correctly. (4) Isn't there some setting for the neural network (like sending half the weights to 0) to prevent overfitting? This neural network needs to be able to generalize.