ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction
MIT License
13.54k stars 4.82k forks source link

Unclear point for the code in Blackjack example #155

Open eatam opened 2 years ago

eatam commented 2 years ago
       % If the player's sum is larger than 21, he may hold one or two aces.
        if player_sum > 21:
            assert player_sum == 22
            # last card must be ace
            player_sum -= 10
        else:
            usable_ace_player |= (1 == card)

Question: in the "if" part of the above code, when last card must be ace, why do you not set "usable_ace_player |= (1 == card)"?

tommasomarzi commented 1 year ago

The description of the example 5.1 of the book states:

If the player holds an ace that he could count as 11 without going bust, then the ace is said to be usable.

Therefore, if the condition player_sum > 21 is satisfied, this means that it came out one of the following combinations (otherwise we would be outside of the while loop):

In both cases, the sum is equal to 22 and the ace (the second one for the first case) is not usable since the player goes bust and therefore it is correct not to set "usable_ace_player |= (1 == card)". Notice that in the first case the first ace leads to usable_ace_player = True since it is actually usable.