andreped / super-ml-pets

🐢 AI for Super Auto Pets
MIT License
31 stars 13 forks source link

Model not improving #25

Closed andreped closed 2 years ago

andreped commented 2 years ago

After training for a while, that is finetuning the current best model(s), we find that the performance plateaus to a maximum of around 20-25 wins-ratio reward.

We should experiment with different values for hyperparameters such as batch mode and learnin rate.

Also, sapai-gym does not currently support freezing. Hence, if we add that the model should likely improve, as freezing is an important aspect of the game, especially for scaling.

Underneath is a plot of the training history, which displays as a contineous training, but where some crashes (and restarts) have occured (which could explain some of the spikes and sudden drops.

training_not_improving

andreped commented 2 years ago

The main reason for why model failed to improve was lack of ability to tune hyperparameters.

I have added option to set params like batch size, learning rate, and gamma. From increasing batch size from 32 to 512, I have already seen a drastic improvement in convergence, which is to be expected.

gamma is also a hyperparam which is commonly tuned when using MaskablePPO. Hence, this could likely further boost performance/convergence. But I have not done that yet, but at least the end-user will be able to :)

As batch size 512 was found to be working better, I have changed the default batch sizet to 512 here: 0eb25ed62de97c43a24c50b3d7100304254102bb

Closing this issue for now.

It is probably better to open a new discussion, if any devs/users are experiencing issues with convergence.