Closed ZhangNanXi closed 4 years ago
The answer should be 0.75. First we decide if we want to act greedy or randomly. So in 1 - epsilon = 1 - 0.5 = 0.5 of the cases we act greedy. But also if we pick randomly there is a 50% chance (cause there are only two possible actions) that we will choose the greedy action. So P(act greedy) = 1 - epsilon + epsilon / #(possible actions) = 1 - 0.5 + 0.5 / 2 = 0.75
I haven't figure out the solution for Ex2.1 , My understanding is that is requires analysis about the relationship between epsilon and the total expected reward, but I don't know how to analyze it.