a little confuse about chapter5/blackjack.py

ChenHuaYou commented 5 years ago

hi,As the following algorithm,I didn't see the corresponding sentence in the code "Unless the pair S t , A t appears in S 0 , A 0 , S 1 , A 1 . . . , S t 1 , A t 1 :"

Initialize: pi(s) belongs to A(s) (arbitrarily), for all s belongs to S Q(s, a) belongs to R (arbitrarily), for all s belongs to S, a belongs to A(s) Returns(s, a)<-empty list, for all s belongs to S, a belongs to A(s) Loop forever (for each episode): Choose S 0 2 S, A 0 2 A(S 0 ) randomly such that all pairs have probability > 0 Generate an episode from S 0 , A 0 , following pi: S 0 , A 0 , R 1 , . . . , S T 1 , A T 1 , R T G<-0 Loop for each step of episode, t = T 1, T 2, . . . , 0: G=lambda*G + R t+1 Unless the pair S t , A t appears in S 0 , A 0 , S 1 , A 1 . . . , S t 1 , A t 1 : Append G to Returns(S t , A t ) Q(S t , A t ) = average(Returns(S t , A t )) pi(S t )=argmax a Q(S t , a)

ShangtongZhang commented 4 years ago

Thanks for pointing this out. It looks an issue with every visit MC and first visit MC. By ignoring that sentence, every visit MC is implemented instead of first visit MC. Maybe fix it in the future.

ShangtongZhang commented 4 years ago

Fixed, thanks!

ShangtongZhang / reinforcement-learning-an-introduction

a little confuse about chapter5/blackjack.py #122