LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction
MIT License
2.04k stars 465 forks source link

Solution to chapter 2 #68

Closed ZhangNanXi closed 4 years ago

ZhangNanXi commented 4 years ago

I haven't figure out the solution for Ex2.1 , My understanding is that is requires analysis about the relationship between epsilon and the total expected reward, but I don't know how to analyze it.

todbeibrot commented 3 years ago

The answer should be 0.75. First we decide if we want to act greedy or randomly. So in 1 - epsilon = 1 - 0.5 = 0.5 of the cases we act greedy. But also if we pick randomly there is a 50% chance (cause there are only two possible actions) that we will choose the greedy action. So P(act greedy) = 1 - epsilon + epsilon / #(possible actions) = 1 - 0.5 + 0.5 / 2 = 0.75