bupticybee / TexasSolver

🚀 A very efficient Texas Holdem GTO solver :spades::hearts::clubs::diamonds:
https://bupticybee.github.io/texassolver_page
GNU Affero General Public License v3.0
1.71k stars 304 forks source link

forced frequency choices #28

Closed ghost closed 3 years ago

ghost commented 3 years ago

You might consider the option of the algorithm to only consider certain action frequencies or frequency increments say 50% increment (0/50/100), or 25% increment (0/25/50/75/100); or 10% (0/10/20/...) or 5% (0/5/10/...)

I expect this will have many advantages - 1) faster convergence 2) the frequency increments might be useful as a warm start - so the solver first uses larger increments for an initial solve, then switches to finer frequency partitions 3) for humans easier to implement strategies

bupticybee commented 3 years ago

Interesting thoughts. Here are something we need to discuss before implementation:

  1. The solver itself might not be good enough to solve 3+ number of bets, a 25/50/75/100 bet size would take serious memory on flop (I just done the test, it would take 8.8G of memory to solve a flop game in 25/50/75/100 bet sizes with spr=10), not to mention the solving time.
  2. Warm start might not be so useful as you might think. I implemented a type of warmup algorithm in https://github.com/bupticybee/TexasSolver/blob/master/src/solver/PCfrSolver.cpp#L251 which use MonteCoral algorithm + card abstraction to construct a "blueprint" strategy and then use full discount-cfr to further "fine-tune" the result. And the result shows that it reduces runtime before reach exploitability like 10% but when reaching 1% or 0.5%(which is commonly used in solver), the speed advantage of warmup disappears.

However I'd love to see explortion done based on this project. Do you have the time for the idea you proposed? I would love to see some experiments and some metrics

ghost commented 3 years ago

So realized that I misexplained

So rather than 10 possible strategies on a 10 percentile increment, i meant that the strategy can only be partitioned on those boundaries. So with possible 2 actions - check and bet 1/2 pot, it can choose any of the following frequency pairs 50/50 would be checks 50% of time and bets 1/2 pot 50% of the time. 0/100, 10/90, 20/80, 30/70, 40/60, 50/50, 60/40, 70/30, 80.20, 90/10, 100/0

and if we had 3 actions possible - check, bet .5x pot, bet 1.5x pot, then there would be frequency triplets - so 30/30/40 would be Check 30% of the time, Bet 1/2 pot 30% of the time and bet 1.5x pot 40% of the time.

#or for n = 3 
from itertools import product
n = 3
possibles = [0,10,20,30,40,50,60,70,80,90,100]
all_combos = [c for c in product(possibles, repeat=n) if sum(c) == 100]
bupticybee commented 3 years ago

So realized that I misexplained

So rather than 10 possible strategies on a 10 percentile increment, i meant that the strategy can only be partitioned on those boundaries. So with possible 2 actions - check and bet 1/2 pot, it can choose any of the following frequency pairs 50/50 would be checks 50% of time and bets 1/2 pot 50% of the time. 0/100, 10/90, 20/80, 30/70, 40/60, 50/50, 60/40, 70/30, 80.20, 90/10, 100/0

and if we had 3 actions possible - check, bet .5x pot, bet 1.5x pot, then there would be frequency triplets - so 30/30/40 would be Check 30% of the time, Bet 1/2 pot 30% of the time and bet 1.5x pot 40% of the time.

#or for n = 3 
from itertools import product
n = 3
possibles = [0,10,20,30,40,50,60,70,80,90,100]
all_combos = [c for c in product(possibles, repeat=n) if sum(c) == 100]

I now see what you mean, it's basically quantize the cfr strategy. Much like what they do in neural net (using fp16,fp8 or even fp2) The problem is that I'm pretty sure it would destroy what CFR-type algorithms are build on. I mean you can do it when solving is done to simplify the result no problem. But how do you propose to do it when the cfr algorithm is running? I didn't read any paper about this, the most similar idea I have read about this topic is Compact CFR but the idea is also somehow still very different from what you proposed.