dennybritz / reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
http://www.wildml.com/2016/10/learning-reinforcement-learning/
MIT License
20.52k stars 6.03k forks source link

Some confusions about `BlackjackEnv` #12

Closed DeeperCS closed 8 years ago

DeeperCS commented 8 years ago

According to the game rules explained in the beginning of class BlackjackEnv of lib/envs/blackjack.py, and the observations returned as follows:

def _get_obs(self):
    return (sum_hand(self.player), self.dealer[0], usable_ace(self.player))

Why return self.dealer[0] as the total points of dealer? I think it just represent the points of first card that the dealer have got.

And maybe this is the reason for the weird 'output' of observation as shown below:

Player Score: 20 (Usable Ace:False), Dealer Score: 10
Taking action:Stick
score(self.player), score(self.dealer) 20 20
Player Score: 20 (Usable Ace:False), Dealer Score: 10
Game end. Reward:0.0

The dealer's score keep the same all the time, while actually he was hitting, and that's why the game end with draw.

DeeperCS commented 8 years ago

I think I've got it, the player can only see one showing card of dealer, and that's part of the observation. However, maybe print out the dealer's real points would be more intuitive for somebody that not familiar with the game rules enough.(。・_・。)

dennybritz commented 8 years ago

Yeah, I took the code directly from OpenAI gym and only modified it slightly. I think the dealer's score is intentionally not printed out because it's not "visible" to a y real player. I agree it's a bit confusing though. Closing this for now!