Exercise 10.5 - Githubissues

I also found the wording of this question confusing. My best guess is to be "how would the differential TD(0) algorithm be different from tabular TD(0)?" Like you, I also came up with the update formula for the weight vector. (10.10) gives us the TD error, assuming we have the average reward estimate R_bar. From there, I think the only thing you're missing to create the differential TD(0) algorithm is the update for R_bar, which uses the TD error.

In tabular TD(0), we have a single line that updates V(S). For differential TD(0), I think we need to expand that to the following 3 lines to update the weights vector.

Let me know if you think that sounds reasonable.

Also, since you have done a lot of work to produce these solutions, you might want to see if Rich Sutton would honor the offer to provide book solutions if you email him your answers :) He said he would on his site! http://incompleteideas.net/book/solutions.html. Your answers have been invaluable as I work through the textbook, and I'd also be curious to know how close you are to the book solutions.

LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Exercise 10.5 #93