Next models to be trained

These are the combinations of architectures and training strategies that we want to train next. We start out with: Average Joe, small architecture (to see if bad performance is due to our extreme reward system), accumulate sum, lr=10e-3 Average Joe, small architecture (to see if bad performance is due to our extreme reward system), accumulate mean, lr=10e-3

Decide which accumulation to use moving on.

Then: Basic reward, mini architecture Basic reward, small architecture Reward(?), mini architecture, play against minimax agent Exotic reward, mini or small, play against minimax

old Reward(?), small architecture, play against minimax agent Basic reward, large architecture, play against minimax agent

Important note

We have introduced other hyperparameters that we want to explore, but our current remaining time for the project is limited. Suggestion: Try out some of our new hyperparameters (for instance #28, #30 and #35) against the model in #29 but maybe just for 100k games. Perhaps this could be done on my pc?

RasmusBrostroem / ConnectFourRL

Next models to be trained #31

Important note