datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
http://www.rlcard.org
MIT License
2.86k stars 618 forks source link

State Representation of Limit holdem / Leduc #267

Closed DavidRSeWell closed 2 years ago

DavidRSeWell commented 2 years ago

Hello, It seems that the player to act i.e. (SB / BB) is not taken into account in the state representation. It seems to me that this would not be able to differentiate different states. For example a hand where the SB calls the BB checks back and they see a flop. The BB then checks. How would the state be differentiated for the SB / BB here? Same number of chips in the pot for both positions. The NFSP paper here https://arxiv.org/pdf/1603.01121.pdf for example uses the player as part of the state. Am I missing something?

alexx-ftw commented 2 years ago

If I am not mistaken, the current implementation for the DQN or NFSP is very basic, and just learns what combinations of Player Cards + Community Cards are good, and how much to bet based on that. Anything else like previous bets from other players on previous hands is not considered.

I am currently working on a NLH bot. If you are interested you can PM me on Slack, where I also replied to you msg.

DavidRSeWell commented 2 years ago

Ok, I see. That makes sense. I didnt realize that was the intent. Thanks. I will talk on slack.

daochenzha commented 2 years ago

@befeltingu Thanks for the feedback. As @alexx-ftw mentioned, the current state feature is just a basic example to show how we can design features using the rlcard package. I expect that the performance can be improved significantly with better state and action features. One possible direction is to follow Figure 3 of the AlphaHoldem paper

In RLCard, we have actually carefully designed state/action features for the game of DouDizhu, and we observe that it can reach human-level performance. So I expect better features can also boost the performance of Hold'em games. Maybe the feature designs of DouDizhu are also helpful (see Table 4 of DouZero paper). We could discuss more in Slack if you are interested.

DavidRSeWell commented 2 years ago

@daochenzha Ok, great thanks for the feedback. Somehow I have not seen the AlphaHoldem paper. I will take a look. I have been doing some experiments with a Kuhn poker game (which was easy to make with this framework) to keep things really simple. In that case just adding the player position was enough to be able to fully represent any state which I think would be the case with Leduc as well. Anyways thanks for the references. I will reach out on slack once I get to more complex scenarios.