Confusion about state representation--obs[53] in No-Limit Texas Hold'em

datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.

http://www.rlcard.org

MIT License

2.91k stars 630 forks source link

Confusion about state representation--obs[53] in No-Limit Texas Hold'em #260

Closed ArshartCloud closed 1 year ago

ArshartCloud commented 2 years ago

In obs describiton, it says that :

53 | Chips that all players have put in

however, in line 70 of nolimitholdem.py

obs[53] = float(max(all_chips))

it use max rather then sum, which means the highest chip any player (however, if opponent chip is lower then you, the game ends, so it can only be opponents chip) put in the pot. Is that right?

daochenzha commented 2 years ago

@ArshartCloud You are right. The state features are usually very important in training agents. The wrapper here is just an example, which is not necessarily the best. You can customize the env wrapper to do better feature engineering.

cuijiaxun commented 1 year ago

I have a follow-up question about the observation space of limit-texas: Looks like the observation only contains disclosed cards, instead of distinguishing private cards and community cards. Isn't this representation a bit problematic when inferring other players' strategies?

daochenzha commented 1 year ago

@cuijiaxun Yeah, that is true. The state space of Texas Hold'em is not carefully designed. I expect the agent will be much stronger if tuning state features, like what we have done in DouDizhu game. The state representation of AlphaHoldem could be borrowed here