LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction
MIT License
2.02k stars 466 forks source link

question about exercise 5.13 #75

Open 315930399 opened 3 years ago

315930399 commented 3 years ago

I really can't understand the proof of Per-decision Importance Sampling in section 5.9 In my opinion, roi(t:t+k-1)*R(t+k) depends on S(t), A(t),...., S(t+k-1), A(t+k-1) and roi(t+k:T-1) depends on S(t+k), A(t+k),...., S(T-1), A(T-1) Since S(t), A(t),...., S(t+k-1), A(t+k-1) and S(t+k), A(t+k),...., S(T-1), A(T-1) are not independent, roi(t:t+k-1)*R(t+k) and roi(t+k:T-1) should also be not independent Hoping for your reply, thanks.