Closed ghost closed 3 years ago
Interesting thoughts. Here are something we need to discuss before implementation:
However I'd love to see explortion done based on this project. Do you have the time for the idea you proposed? I would love to see some experiments and some metrics
So realized that I misexplained
So rather than 10 possible strategies on a 10 percentile increment, i meant that the strategy can only be partitioned on those boundaries. So with possible 2 actions - check and bet 1/2 pot, it can choose any of the following frequency pairs 50/50 would be checks 50% of time and bets 1/2 pot 50% of the time. 0/100, 10/90, 20/80, 30/70, 40/60, 50/50, 60/40, 70/30, 80.20, 90/10, 100/0
and if we had 3 actions possible - check, bet .5x pot, bet 1.5x pot, then there would be frequency triplets - so 30/30/40 would be Check 30% of the time, Bet 1/2 pot 30% of the time and bet 1.5x pot 40% of the time.
#or for n = 3
from itertools import product
n = 3
possibles = [0,10,20,30,40,50,60,70,80,90,100]
all_combos = [c for c in product(possibles, repeat=n) if sum(c) == 100]
So realized that I misexplained
So rather than 10 possible strategies on a 10 percentile increment, i meant that the strategy can only be partitioned on those boundaries. So with possible 2 actions - check and bet 1/2 pot, it can choose any of the following frequency pairs 50/50 would be checks 50% of time and bets 1/2 pot 50% of the time. 0/100, 10/90, 20/80, 30/70, 40/60, 50/50, 60/40, 70/30, 80.20, 90/10, 100/0
and if we had 3 actions possible - check, bet .5x pot, bet 1.5x pot, then there would be frequency triplets - so 30/30/40 would be Check 30% of the time, Bet 1/2 pot 30% of the time and bet 1.5x pot 40% of the time.
#or for n = 3 from itertools import product n = 3 possibles = [0,10,20,30,40,50,60,70,80,90,100] all_combos = [c for c in product(possibles, repeat=n) if sum(c) == 100]
I now see what you mean, it's basically quantize the cfr strategy. Much like what they do in neural net (using fp16,fp8 or even fp2) The problem is that I'm pretty sure it would destroy what CFR-type algorithms are build on. I mean you can do it when solving is done to simplify the result no problem. But how do you propose to do it when the cfr algorithm is running? I didn't read any paper about this, the most similar idea I have read about this topic is Compact CFR but the idea is also somehow still very different from what you proposed.
You might consider the option of the algorithm to only consider certain action frequencies or frequency increments say 50% increment (0/50/100), or 25% increment (0/25/50/75/100); or 10% (0/10/20/...) or 5% (0/5/10/...)
I expect this will have many advantages - 1) faster convergence 2) the frequency increments might be useful as a warm start - so the solver first uses larger increments for an initial solve, then switches to finer frequency partitions 3) for humans easier to implement strategies