Some explanation of tictactoe is required

ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction

MIT License

13.58k stars 4.82k forks source link

Your code is one of the exercises from Chap. 1 ? Mine is the port of the tictactoe lisp example to Haskell(https://github.com/mohanr/Reinforcement-Learning-An-Introduction-by-Richard-S.-Sutton-and-Andrew-G.-Barto/blob/master/tictactoe.hs)

What is the result from 'run' or 'runs' that proves the learning is happening ? My code returns a value between 40 and 50.

I mean this section.

        (defun run ()
               (loop repeat 40 do (print (/ (loop repeat 100 sum (game t)) 
                        100.0))))

My results are like this.

        Played 100 times 42.0  0.42
        Played 100 times 43.0  0.43
        Played 100 times 42.5  0.425
        Played 100 times 40.5  0.405
        Played 100 times 43.0  0.43
        Played 100 times 43.0  0.43
        Played 100 times 42.0  0.42

Did you create the example with a view to actually prove the RL algorithm learns ?

ShangtongZhang / reinforcement-learning-an-introduction

Some explanation of tictactoe is required #33