datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
http://www.rlcard.org
MIT License
2.91k stars 630 forks source link

Total number of states #101

Open lssr opened 4 years ago

lssr commented 4 years ago

What is the total number of states for each game?

daochenzha commented 4 years ago

@lssr Calculating this number is very challenging. An estimation is here https://github.com/datamllab/rlcard#available-environments Some are from other sources, and some are estimated by us. However, correctness is not guaranteed. The total number of states should be InfoSet Number * InfoSet Size

jkterry1 commented 4 years ago

Do you have the number for gin rummy?

billh0420 commented 4 years ago

@justinkterry

For the initial state: 52 choose 11 = 60403728840

For the second state: (52 choose 10) * 42 = 664441017240

jkterry1 commented 4 years ago

@daochenzha can you add this to your table?

daochenzha commented 4 years ago

@billh0420 I feel like it is not that easy. There are two aspects we may need to consider. 1. the cards on the board, since the cards on the board could be used to predict future cards and guess the hand cards of the opponent. 2. the historical actions taken by the players could be also informative. Thus, I guess we need to consider all these situations when calculating the number of states right?

jkterry1 commented 4 years ago

Any more thoughts here?

billh0420 commented 4 years ago

@daochenzha said "since the cards on the board could be used to predict future cards" Yes. Once a discard is buried, it can never be used in a meld or drawn or picked up.

@daochenzha said " and guess the hand cards of the opponent." This raises an interesting problem. We are training our agents not to be adaptive once trained. The normal approach is to train it against an earlier version of itself. Thus, it should be able to guess the cards that a strong opponent is holding (or the probabilities of such). However, if it is playing against a random player, there is no way to guess what that player is doing. This occurs with LeelaChess0: it plays excellent against the strongest chess engines, but doesn't do as well against weaker chess engines as the stronger chess engines do.

@justinkterry asked "Any more thoughts here?"

Ignoring picking up discards, the discard pile can have up to 30 cards in a specific order. The number of such discard piles is:

    39 + 39 * 38 + 39 * 38 * 37 + ... 39!

which is about 2e+46. Multiplying by the number of ways of having 10 or 11 cards, I get:

     (2e+46) * (664441017240 + 60403728840) = (2e+46)*(7e+11) = 1.4e+58

I hope I didn't make some arithmetic error.

Thus, there are around 1.4e+58 states for Gin Rummy.

billh0420 commented 4 years ago

The above seems wrong. When the stock pile has 2 cards and the discard pile has 30 cards is the critical situation. The number of cases for the discard pile is 52 51 ... * 23 = 7e+46. The player can have 10 cards from the remaining 22 cards: 22 chose 10 = 646646. Multiplying these two numbers, I get:

 (7e+46) * (6.5e+5) = 45.5e+51 = 4.6e+52

Thus, there are around 4.6e+52 states for Gin Rummy.

jkterry1 commented 4 years ago

Thanks a ton!

@daochenzha does that seem right to you?

daochenzha commented 4 years ago

@billh0420 @justinkterry Thanks a lot! The calculation looks reasonable to me. I will put it into README later. We have also 2 other dimensions. One is action size. The other one is the average size of each state (in one state, there are many possibilities of other player's hands). This metric somehow measures the "imperfections", or how much information is given.

Do you happen to have any comments on these numbers?

jkterry1 commented 3 years ago

@daochenzha this should be closable now?