Closed ghost closed 3 years ago
Again, interesting thoughts. This issue contains two seperate ideas though.
I will separately discuss this two ideas.
Anyway happy to see such high quality issue. Hope to see more thoughts and/or more experiments on these subjects.
Thanks for the earth mover paper - I'm certain I've read it in the past but slipped my mind.
That is disappointing that the warm start doesn't seem to provide much improvement for convergence.
I'd love to contribute code at 'some point in the future' - no idea when I might get to it,.
Thanks for the earth mover paper - I'm certain I've read it in the past but slipped my mind.
That is disappointing that the warm start doesn't seem to provide much improvement for convergence.
I'd love to contribute code at 'some point in the future' - no idea when I might get to it,.
Yeah, actually it does provide 2~4x speedup when reaching 10% accuracy than the benchmark, but normally when we use solver we want accuracy(exploitability) to be below 1%, where the warmup version basically show no advantage under this situation. It still would still look good on a paper though.
About abstraction I'm not sure whether to implement it or not, I get opinions from pro players says that they don't use abstraction because it would not reach the accuracy they want. That's also what I observe when doing the warmup experiment, the abstracted version of cfr can't improve when it's reaching 10% accuracy (and that's part of the reason why I only use it to warmup), which is fatal for a solver.
When you are ready I'd love to see more ideas and/or even code.
Noam Brown has a paper on warm starting. It isn't enough to simply start with a better initial strategy. You need to try to set the regrets as if it had already done the iterations to reach that point.
I don't think that it is a good use of time to implement. If you are running cfr for 2000 iterations and can skip 10 iterations by warm starting, it's only a 0.5% speedup.
Noam Brown has a paper on warm starting. It isn't enough to simply start with a better initial strategy. You need to try to set the regrets as if it had already done the iterations to reach that point.
I don't think that it is a good use of time to implement. If you are running cfr for 2000 iterations and can skip 10 iterations by warm starting, it's only a 0.5% speedup.
I am aware of that paper and the code contains the logic where regrets is set to as if has done the iteration. I use card abstraction + MCCFR to quickly get the initial strategy with regrets, which will push the accuracy to 10% very quickly, and then broadcast the regrets to individual info set and do full discount-cfr.
And from my experiments if solve to 0.5% takes 200 iterations, warmup would propably skip like 50 to 70 iterations. When solve to 10% accuracy, warmup is 2x~5x quicker. However when solve to accuracy like 0.5%, the time of benchmark and warmup takes almost same amount of time, the speed advantage of warmup disappears, it converges slower in the later CFR rounds. Warmup just is not good enough for solver environment, at least from what I have seen.
And BTW, don't believe everything they say on the paper.
It might provide faster convergence to be able to specify an initial strategy.
Some ideas on how to do this
the simplest is use equity thresholds vs a subset of hands - the simplest subset might be TPTK (on an unpaired board).
Next is bit more complex strategy specification used in software like cardrunners EV, such as hands > top pair top kicker on a board texture of no pair
https://www.cardrunnersev.com/manual/postflopmenu.html
fairly tedious though.
Lastly, my preffered method is using the variables specified in the papers on pokibot by university of alberta - hand strength (HS), positive hand potential (pPot), negative hand potential (nPot). I'd also add bluff range blocking (BlfBlk), and value range blocking (VBlk).
The value of this is we can use these to understand strategies created by the solver. and to simplify storage since strategies will tend to cluster based on these five properties. For instance medium HS, high nPot are vulnerable hands and will tend to want to bet for value and protection; High HS with value blockers are hands like top set, that block villians calling range, so will want to check on the flop. If in the future you want to do bucketing post flop for faster convergence, you can use these properties for creating the buckets.
Here are the formulas for the first three, from
https://poker.cs.ualberta.ca/publications/AAAI98.pdf
For the blockers, one formula counts the number hand combos blocked in the top 30% (for value blockers) and bottom 30% for (bluff blockers)