Fix usable_ace_player bug, fix indention error, set POLICY_PLAYER dty…

        # initialize cards of player
        cards = []
        while player_sum < 12:
            # if sum of player is less than 12, always hit
            card = get_card()
            cards.append(card)
            player_sum += card_value(card)
        usable_ace_player = (1 in cards)

        # Always use an ace as 11, unless there are two.
        # If the player's sum is larger than 21, he must hold two aces.
        if player_sum > 21:
            assert player_sum == 22
            # use one Ace as 1 rather than 11
            player_sum -= 10

After initializing, cards of player may be [2, 9, 1], thus player_sum is 12, and the last ace is valued as 1, while usable_ace_player is True, which should be False.

ShangtongZhang / reinforcement-learning-an-introduction

Fix usable_ace_player bug, fix indention error, set POLICY_PLAYER dty… #115