LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction
MIT License
2.04k stars 465 forks source link

Add ex8.8 code and plot #58

Closed burmecia closed 4 years ago

burmecia commented 4 years ago

My solution has similar shape with the book, but different start state value under the greedy policy. I am not sure where goes wrong, probably in the reward calculation? But my results are similar to all the other people's results which I found online (see below reference implementation). So just take my solution as one of the references, don't treat it absolutely correct.

Other reference implementation:

  1. https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/blob/master/chapter08/trajectory_sampling.py
  2. https://github.com/JuliaReinforcementLearning/ReinforcementLearningAnIntroduction.jl/blob/b5c718a891a4b3db4fae177b8b33ca506df1ecea/notebooks/Chapter08_Trajectory_Sampling.ipynb
  3. https://github.com/enakai00/rl_book_solutions/blob/master/Chapter08/Exercise_8_8_Solution.ipynb

ex8_8