RasmusBrostroem / ConnectFourRL

0 stars 0 forks source link

TD-backgammon inspired agent #81

Open jbirkesteen opened 1 year ago

jbirkesteen commented 1 year ago

Implement a new agent using the TD($\lambda$)-algorithm, inspired by Tesauro's solution for backgammon, TD-gammon. This extends existing player classes, but is different in several key aspects.

State representation

We represent game states using binary input nodes showing which spots are occupied and by whom. We create an input vector where:

Self-play methodologies

We discussed several ways of implementing self-play:

We start out with simple self-play and can discuss if we want to try the others later.

Needed for training

jbirkesteen commented 11 months ago

Next steps in experimenting: Larger network (around 7-8k parameters). Play more games.

We can try out some of the hyperparameters, once we've read a bit in the book.