tictactoe compete() plays 1000 almost identical games

ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction

MIT License

13.54k stars 4.82k forks source link

Hi there, thanks for providing this great RL resource!

I have a comment / suggestion for the tictactoe.py code:

The tictactoe.py code uses a compete() function to test if the AI players are sufficiently well trained. If they play well enough, each game should end in a tie. With the default settings in the code, all 1000 games end up in a tie.

However, this is not super informative to whether the AI has learned to play the game well.

Why? Because epsilon is zero for both players, both players follow the learned Q-table greedily, and therefore make identical choices in all states where one move has a dominant Q-value. This is the case for the first six turns. The only variation is after six turns, there are three moves that have equal Q-value, and therefore one of them is randomly chosen.

I think an improvement is to let one player use the Q-table greedily, and the other player select moves randomly.

Regards, Gertjan

ShangtongZhang / reinforcement-learning-an-introduction

tictactoe compete() plays 1000 almost identical games #145