Closed ShaowuChen closed 3 years ago
It may be Q_{k+1}=Qk + \alpha{k}(R_k-Qk}, instead of Q{k+1}=Qk + \alpha{k+1}(R_{k+1}-Q_k}, because the logic is when t=k, choose action A_t, get reward Rk, then update Q{k+1}.
Thanks for pointing it out. It has been added to the error log and will be updated in the next version.
It may be Q_{k+1}=Qk + \alpha{k}(R_k-Qk}, instead of Q{k+1}=Qk + \alpha{k+1}(R_{k+1}-Q_k}, because the logic is when t=k, choose action A_t, get reward Rk, then update Q{k+1}.