ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction
MIT License
13.58k stars 4.82k forks source link

Some explanation of tictactoe is required #33

Closed mohanr closed 7 years ago

mohanr commented 7 years ago

Your code is one of the exercises from Chap. 1 ? Mine is the port of the tictactoe lisp example to Haskell(https://github.com/mohanr/Reinforcement-Learning-An-Introduction-by-Richard-S.-Sutton-and-Andrew-G.-Barto/blob/master/tictactoe.hs)

What is the result from 'run' or 'runs' that proves the learning is happening ? My code returns a value between 40 and 50.

I mean this section.

        (defun run ()
               (loop repeat 40 do (print (/ (loop repeat 100 sum (game t)) 
                        100.0))))

My results are like this.

        Played 100 times 42.0  0.42
        Played 100 times 43.0  0.43
        Played 100 times 42.5  0.425
        Played 100 times 40.5  0.405
        Played 100 times 43.0  0.43
        Played 100 times 43.0  0.43
        Played 100 times 42.0  0.42

Did you create the example with a view to actually prove the RL algorithm learns ?

ShangtongZhang commented 7 years ago

Your code is one of the exercises from Chap. 1 ?

No, I just wrote it. It's not intent to be an answer to some exercise.

What is the result from 'run' or 'runs' that proves the learning is happening ?

To be honest I don't know. I didn't refer to the lisp code because I don't have any knowledge about lisp.

Did you create the example with a view to actually prove the RL algorithm learns ?

Yeah, there is a terminal interface. One can play with the program.