RasmusBrostroem / ConnectFourRL

0 stars 0 forks source link

Further training #38

Closed jbirkesteen closed 1 year ago

jbirkesteen commented 2 years ago

batch_size

We want to experiment with varying batch_sizes (1, 10, 20) on

Minimax

This will be half-half split between agent and minimax. Use AverageJoe with small architecture, 10k episodes per gen and 30 generations.

Double agent

Save agent each time we update it. Load it as its own opponent. Benchmark against minimax or some saved model (averagejoe for instance) I might get this to work on my pc.

jbirkesteen commented 2 years ago

If you have the time: Make a new reward system with 100 for win and -25 for loss. Set it up to play only against minimax algorithm.