bupticybee / TexasSolver

🚀 A very efficient Texas Holdem GTO solver :spades::hearts::clubs::diamonds:
https://bupticybee.github.io/texassolver_page
GNU Affero General Public License v3.0
1.71k stars 304 forks source link

specification of initial strategy #29

Closed ghost closed 3 years ago

ghost commented 3 years ago

It might provide faster convergence to be able to specify an initial strategy.

Some ideas on how to do this

the simplest is use equity thresholds vs a subset of hands - the simplest subset might be TPTK (on an unpaired board).

Next is bit more complex strategy specification used in software like cardrunners EV, such as hands > top pair top kicker on a board texture of no pair

https://www.cardrunnersev.com/manual/postflopmenu.html

fairly tedious though.

Lastly, my preffered method is using the variables specified in the papers on pokibot by university of alberta - hand strength (HS), positive hand potential (pPot), negative hand potential (nPot). I'd also add bluff range blocking (BlfBlk), and value range blocking (VBlk).

The value of this is we can use these to understand strategies created by the solver. and to simplify storage since strategies will tend to cluster based on these five properties. For instance medium HS, high nPot are vulnerable hands and will tend to want to bet for value and protection; High HS with value blockers are hands like top set, that block villians calling range, so will want to check on the flop. If in the future you want to do bucketing post flop for faster convergence, you can use these properties for creating the buckets.

Here are the formulas for the first three, from

https://poker.cs.ualberta.ca/publications/AAAI98.pdf

HandStrength(ourcards,boardcards)
{ ahead = tied = behind = 0
ourrank = Rank(ourcards,boardcards)
/* Consider all two card combinations of */
/* the remaining cards. */
for each case(oppcards)
{ opprank = Rank(oppcards,boardcards)
if(ourrank>opprank) ahead += 1
else if(ourrank=opprank) tied += 1
else /* < */ behind += 1
}
handstrength = (ahead+tied/2)
/ (ahead+tied+behind)
return(handstrength)
}
HandPotential(ourcards,boardcards)
{ /* Hand potential array, each index repre- */
/* sents ahead, tied, and behind. */
integer array HP[3][3] /* initialize to 0 */
integer array HPTotal[3] /* initialize to 0 */
ourrank = Rank(ourcards,boardcards)
/* Consider all two card combinations of */
/* the remaining cards for the opponent. */
for each case(oppcards)
{ opprank = Rank(oppcards,boardcards)
if(ourrank>opprank) index = ahead
else if(ourrank=opprank) index = tied
else /* < */ index = behind
HPTotal[index] += 1
/* All possible board cards to come. */
for each case(turn,river)
{ /* Final 5-card board */
board = [boardcards,turn,river]
ourbest = Rank(ourcards,board)
oppbest = Rank(oppcards,board)
if(ourbest>oppbest) HP[index][ahead]+=1
else if(ourbest=oppbest) HP[index][tied]+=1
else /* < */ HP[index][behind]+=1
}
}
/* Ppot: were behind but moved ahead. */
Ppot = (HP[behind][ahead]+HP[behind][tied]/2
+HP[tied][ahead]/2)
/ (HPTotal[behind]+HPTotal[tied])
/* Npot: were ahead but fell behind. */
Npot = (HP[ahead][behind]+HP[tied][behind]/2
+HP[ahead][tied]/2)
/ (HPTotal[ahead]+HPTotal[tied])
return(Ppot,Npot)
}

For the blockers, one formula counts the number hand combos blocked in the top 30% (for value blockers) and bottom 30% for (bluff blockers)

bupticybee commented 3 years ago

Again, interesting thoughts. This issue contains two seperate ideas though.

  1. Calculation of HS or some other metrics, or even to cluster hands based on that.
  2. Use HS or other metrics to initize a fairly good initial strategy and to further "fine-tune" the strategy from there

I will separately discuss this two ideas.

  1. I an aware of HS and other metrics. However we have far better tools for hand analysis than HS now, check this paper https://www.cs.cmu.edu/~sandholm/potential-aware_imperfect-recall.aaai14.pdf . I have done some early explortion about this subject here https://github.com/bupticybee/texas_holdem/blob/master/jupyters/RulesAndAbstraction.ipynb in which I implemented "Potential-Aware Imperfect-Recall Abstraction" and done some clustering experiments. If you like you can try to add the abstraction part to this solver, I'm totally ok and open to new stuffs.
  2. About a good initial strategy. As mentioned in #28 . I have done some warm-start experiments. The initial strategy might provide a good starting point, but not necessarily means faster converage. And another issue is how to convert a metrics like HS to a strategy? I didn't see any paper discuss that.

Anyway happy to see such high quality issue. Hope to see more thoughts and/or more experiments on these subjects.

ghost commented 3 years ago

Thanks for the earth mover paper - I'm certain I've read it in the past but slipped my mind.

That is disappointing that the warm start doesn't seem to provide much improvement for convergence.

I'd love to contribute code at 'some point in the future' - no idea when I might get to it,.

bupticybee commented 3 years ago

Thanks for the earth mover paper - I'm certain I've read it in the past but slipped my mind.

That is disappointing that the warm start doesn't seem to provide much improvement for convergence.

I'd love to contribute code at 'some point in the future' - no idea when I might get to it,.

Yeah, actually it does provide 2~4x speedup when reaching 10% accuracy than the benchmark, but normally when we use solver we want accuracy(exploitability) to be below 1%, where the warmup version basically show no advantage under this situation. It still would still look good on a paper though.

About abstraction I'm not sure whether to implement it or not, I get opinions from pro players says that they don't use abstraction because it would not reach the accuracy they want. That's also what I observe when doing the warmup experiment, the abstracted version of cfr can't improve when it's reaching 10% accuracy (and that's part of the reason why I only use it to warmup), which is fatal for a solver.

When you are ready I'd love to see more ideas and/or even code.

brewer-b commented 3 years ago

Noam Brown has a paper on warm starting. It isn't enough to simply start with a better initial strategy. You need to try to set the regrets as if it had already done the iterations to reach that point.

I don't think that it is a good use of time to implement. If you are running cfr for 2000 iterations and can skip 10 iterations by warm starting, it's only a 0.5% speedup.

bupticybee commented 3 years ago

Noam Brown has a paper on warm starting. It isn't enough to simply start with a better initial strategy. You need to try to set the regrets as if it had already done the iterations to reach that point.

I don't think that it is a good use of time to implement. If you are running cfr for 2000 iterations and can skip 10 iterations by warm starting, it's only a 0.5% speedup.

I am aware of that paper and the code contains the logic where regrets is set to as if has done the iteration. I use card abstraction + MCCFR to quickly get the initial strategy with regrets, which will push the accuracy to 10% very quickly, and then broadcast the regrets to individual info set and do full discount-cfr.

And from my experiments if solve to 0.5% takes 200 iterations, warmup would propably skip like 50 to 70 iterations. When solve to 10% accuracy, warmup is 2x~5x quicker. However when solve to accuracy like 0.5%, the time of benchmark and warmup takes almost same amount of time, the speed advantage of warmup disappears, it converges slower in the later CFR rounds. Warmup just is not good enough for solver environment, at least from what I have seen.

And BTW, don't believe everything they say on the paper.