RasmusBrostroem / ConnectFourRL

0 stars 0 forks source link

Minmax #23

Closed RasmusBrostroem closed 1 year ago

RasmusBrostroem commented 2 years ago

We need a semi-MiniMax algorithm, that is able to look x steps ahead, like 3 or 5 and determine the best move, and if not best move can be determined after 3 to 5 moves, then we just choose a random legal action.

This should make it so the policy gradient agent has someone to train against that it should be able to win against if it plays well, but will have a hard time winning against, so it has to learn difficult strategies.

jbirkesteen commented 1 year ago

This was originally implemented in december 2021 and has also been refactored with #70. Minimax-player is currently part of players.py.