Closed andreped closed 2 years ago
The main reason for why model failed to improve was lack of ability to tune hyperparameters.
I have added option to set params like batch size
, learning rate
, and gamma
. From increasing batch size
from 32
to 512
, I have already seen a drastic improvement in convergence, which is to be expected.
gamma
is also a hyperparam which is commonly tuned when using MaskablePPO. Hence, this could likely further boost performance/convergence. But I have not done that yet, but at least the end-user will be able to :)
As batch size 512 was found to be working better, I have changed the default batch sizet to 512 here: 0eb25ed62de97c43a24c50b3d7100304254102bb
Closing this issue for now.
It is probably better to open a new discussion, if any devs/users are experiencing issues with convergence.
After training for a while, that is finetuning the current best model(s), we find that the performance plateaus to a maximum of around 20-25 wins-ratio reward.
We should experiment with different values for hyperparameters such as
batch mode
andlearnin rate
.Also,
sapai-gym
does not currently support freezing. Hence, if we add that the model should likely improve, as freezing is an important aspect of the game, especially for scaling.Underneath is a plot of the training history, which displays as a contineous training, but where some crashes (and restarts) have occured (which could explain some of the spikes and sudden drops.