datamllab / rlcard

Reinforcement Learning / AI Bots in Card (Poker) Games - Blackjack, Leduc, Texas, DouDizhu, Mahjong, UNO.
http://www.rlcard.org
MIT License
2.88k stars 627 forks source link

Could you make a DeepCFR example for NoLimit Holdem? #153

Closed ilkkatakayama closed 3 years ago

ilkkatakayama commented 4 years ago

Could you make a DeepCFR example for NoLimit Holdem? Thank you for a great project.

lhenry15 commented 4 years ago

Hi, we have modified the DeepCFR and added an example of NoLimit Holdem. Please be aware that training a DeepCFR agent will take a long time.

jiahui-x commented 4 years ago

How long will it take in a laptop in general as you estimate?

lhenry15 commented 4 years ago

For your reference, we haven't finished one iteration after initiating the training on our server using only 1 CPU (Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz) for more than 350 hours. We strongly recommend to start from an easier game, e.g. limit-holdem.

jneckar commented 4 years ago

If you are trying to run CFR for full-game NLHE on your laptop you will be waiting for quite some time... The successful superhuman NLH agents ran CFR on an abstracted version of the game and still took millions of core-hours to train. From the paper on Libratus:

"The last two betting rounds, which are exponentially larger, are more coarsely abstracted. The 55 million different hand possibilities on the third round are grouped into 2.5 million buckets, and the 2.4 billion different possibilities on the fourth round are grouped into 1.25 million buckets. ... The abstraction algorithm took the game size from 10^161 decision points down to 10^12... In total, Libratus used about 25 million core hours ... The equilibrium finding and self-improvement algorithms used 196 nodes on the Bridges supercomputer at the Pittsburgh Supercomputing Center. Each node has 128 GB of memory and 28 cores."