LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction
MIT License
2.04k stars 465 forks source link

Exercise 2.3 #87

Closed ShaowuChen closed 3 years ago

ShaowuChen commented 3 years ago

It may be Q_{k+1}=Qk + \alpha{k}(R_k-Qk}, instead of Q{k+1}=Qk + \alpha{k+1}(R_{k+1}-Q_k}, because the logic is when t=k, choose action A_t, get reward Rk, then update Q{k+1}.

LyWangPX commented 3 years ago

Thanks for pointing it out. It has been added to the error log and will be updated in the next version.