Ex 3.5 - Githubissues

franzoni315 commented 4 years ago

Being more precise in the solution, I think s belongs to S (not S+), since the dynamics would not make much sense for the terminal state, i.e., there are no possible next states or even actions.

LyWangPX commented 4 years ago

Well, you are right. But shouldn't that be s' belongs to S under first summation? I think that would be more preicise.

franzoni315 commented 4 years ago

Uhmm actually under first summation I would pick s' belongs to S+, since the next state might be the terminal state, and pick s belongs to S, due to my previous post.

I just noticed there is some confusion upon the definition of S+. The text book says:

In episodic tasks we sometimes need to distinguish the set of all nonterminal states, denoted S, from the set of all states plus the terminal state, denoted S+.

However, on the solution, S+ = {Non-terminal states}. The text book calls this S actually, while S+ is the S plus the terminal states, i.e., all possible states.

Does it make sense?

LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Ex 3.5 #65