SirRender00 / texasholdem

A pure python package for Texas Hold 'Em poker
https://github.com/SirRender00/texasholdem/
MIT License
75 stars 18 forks source link

How to make a copy for simulation #196

Open LucasColas opened 1 year ago

LucasColas commented 1 year ago

Hello, I would like to know how I can make a copy of a game (with the same players, the same pots, etc.). I need it for Monte Carlo Tree Search where the algorithm needs to do simulation. I can't do deepcopy because there are generators.

SirRender00 commented 1 year ago

Good point. I do not believe there is a simple way to get this right now. There would also be some good questions regarding the copying of the Deck in this use-case: most likely we'll want to reshuffle the Deck in each instance so that the Monte Carlo simulation does not learn the hidden state of the Deck,

LucasColas commented 1 year ago

Ok, I tried to copy every attribute but there's still some issues. The issues are often related to the generators.

LucasColas commented 1 year ago

Hello,

I tried another way to make a copy.

I copy the cards of each board. However I have an issue when there's an evaluation. prime_product_from_rankbits gives a number that doesn't exist in Look up table.

LucasColas commented 1 year ago

I tried to do a copy using hand_history. Here is my code (taken from the _import_history(history: History) method) :

num_players = len(history.prehand.player_chips)
game = TexasHoldEm(
            buyin=1,
            big_blind=history.prehand.big_blind,
            small_blind=history.prehand.small_blind,
            max_players=num_players,
        )

gui = TextGUI(game=game)

# button placed right before 0
game.btn_loc = num_players - 1

        # read chips
for i in game.player_iter(0):
        game.players[i].chips = history.prehand.player_chips[i]

        # stack deck
deck = Deck()
if history.settle:
    deck.cards = list(history.settle.new_cards)

# player actions in a stack
player_actions  = []
for bet_round in (history.river, history.turn, history.flop, history.preflop):
        if bet_round:
            deck.cards = bet_round.new_cards + deck.cards
            for action in reversed(bet_round.actions):
                player_actions.insert(
                    0, (action.player_id, action.action_type, action.total)
                )

# start hand (deck will deal)
game.start_hand()

# give players old cards
for i in game.player_iter():
       game.hands[i] = history.prehand.player_cards[i]

        # swap decks
game._deck = deck

while game.is_hand_running():
        gui.display_state()
        gui.wait_until_prompted()
        try:
            player_id, action_type, total = player_actions.pop(0)
            game.take_action(action_type=action_type, total=total)
        except:
            action, total = random_agent(game)
            game.take_action(action_type=action, total=total)

        gui.display_action()

gui.display_win()

It works pretty well except I can't get the same sb_loc, bb_loc, and btc_loc. I can change their values with the values of my hand. But the next current player will still be different.

LucasColas commented 1 year ago

I think this issue is related to _prehand method (from TexasHoldem). "Because" of this :

self.btn_loc = active_players[0]
self.sb_loc = active_players[1]

# heads up edge case => sb = btn
if len(active_players) == 2:
    self.sb_loc = self.btn_loc

self.bb_loc = next(self.in_pot_iter(self.sb_loc + 1))

And this :


self._player_post(self.sb_loc, self.small_blind)
self._player_post(self.bb_loc, self.big_blind)
self.last_raise = 0

# action to left of BB
self.current_player = next(self.in_pot_iter(loc=self.bb_loc + 1))
SirRender00 commented 1 year ago

I think for a full __copy__ dunder method, it will be very similar to the the _import_history method for sure. So definitely good start.

As for the discrepancies with the button/sb/bb locations, this is probably because the hand history attached to the game object does not have canonical player IDs (e.g. when we export, we make the button player id 0). So this line game.btn_loc = num_players - 1 may not be right.

I am planning to take a look at this when I have some time in the coming weeks. Thanks for taking a look, feel free to continue to experiment

LucasColas commented 1 year ago

Here's another thing I tried :

def generate_game(history, blinds, gui=False):

    num_players = len(history.prehand.player_chips)
    game = TexasHoldEm(
            buyin=1,
            big_blind=history.prehand.big_blind,
            small_blind=history.prehand.small_blind,
            max_players=num_players,
        )

    gui = TextGUI(game=game)

        # button placed right before 0
    game.btn_loc = num_players - 1

    # read chips

    # stack deck
    deck = Deck()
    if history.settle:
        deck.cards = list(history.settle.new_cards)

    # player actions in a stack
    player_actions  = []
    for bet_round in (history.river, history.turn, history.flop, history.preflop):
        if bet_round:
            deck.cards = bet_round.new_cards + deck.cards
            for action in reversed(bet_round.actions):
                player_actions.insert(
                    0, (action.player_id, action.action_type, action.total)
                )

    # start hand (deck will deal)
    game.start_hand()

    # give players old cards
    for i in game.player_iter():
        game.hands[i] = history.prehand.player_cards[i]

    game.pots = [Pot()]

    for i in game.player_iter(0):
        game.players[i].chips = history.prehand.player_chips[i]
        game.players[i].state = PlayerState.IN
        game.players[i].last_pot = 0

    game.btn_loc = history.prehand.btn_loc
    game.sb_loc = blinds[0]
    game.bb_loc = blinds[1]
    game._player_post(game.sb_loc, history.prehand.small_blind)
    game._player_post(game.bb_loc, history.prehand.big_blind)
    game.current_player = next(game.in_pot_iter(loc=game.bb_loc + 1))
    print("current_player : ", game.current_player)
    print("pot iter : ", next(game.in_pot_iter(loc=game.bb_loc + 1)))

        # swap decks
    game._deck = deck

    while game.is_hand_running():
        print("current_player : ", game.current_player)
        gui.display_state()
        gui.wait_until_prompted()
        try:
            print("game current player : ",game.current_player)
            player_id, action_type, total = player_actions.pop(0)
            game.current_player = player_id
            game.take_action(action_type=action_type, total=total)
            #print("current_player : ", game.current_player)
            print("player iter", next(game.player_iter(game.current_player)))
        except Exception as e:
            print(e)
            print("random action")
            action, total = random_agent(game)
            game.take_action(action_type=action, total=total)

        gui.display_action()

    gui.display_win()

That's pretty similar to the previous code. It seems next(game.player_iter(game.current_player)) gives the next current player. But this current player is different from the current player of the history.

SirRender00 commented 1 year ago

^ This pull request should do it. I'll make a prerelease soon and you can try it out and provide feedback if that does the trick.

LucasColas commented 1 year ago

Thank you for your effort.

SirRender00 commented 1 year ago

@LucasColas Okay, 0.10-alpha.0 is the prelease for this if you wanted to upgrade to it and try it out.

LucasColas commented 1 year ago

The cards of the players are not the same ?

LucasColas commented 1 year ago

I think we should add the possibility to copy the cards of one or several players.

SirRender00 commented 1 year ago

Yes, seems like the hands are not being copied properly. Gonna take a look at this

SirRender00 commented 1 year ago

@LucasColas Okay, just released 0.10-alpha.1 that should be copying the hands correctly

LucasColas commented 1 year ago

OK, thank you. I'll try.

LucasColas commented 1 year ago

It seems to be working.

LucasColas commented 1 year ago

Do you think it's possible to copy the cards of selected players only ?

SirRender00 commented 1 year ago

What are you trying to do exactly? The intent might be out of the realm of the TexasHoldEm object. This is possible by messing with the hands attribute and the _deck attribute.

LucasColas commented 1 year ago

Let's assume I want to get the cards of the player 3. But the other players should not have the same cards. So my copy should return the same game except for players 1,2,4,..n where they would have different cards. The cards of the boards and the player 3 should be the same in this example.

I think it's better to do this because for Monte Carlo Tree Search you're not supposed to know the cards of the opponents (except if they reveal the cards).

LucasColas commented 1 year ago

Here's a possible draft : https://github.com/SirRender00/texasholdem/pull/203