Closed brupelo closed 3 years ago
Hi, @brupelo Thanks for your interest! I would suggest starting from Leduc Holdem (which is a simple version of Texas Hold'em).
First of all, currently, reinforcement learning is still not as good as libratus, pluribus in Texas Hold'em. However, reinforcement learning is more efficient since it is a sampling-based method and more general. I believe with more research efforts, the gap can be narrowed.
There are two categories of algorithms. The first is CFR-based. We only implement basic CFR in this repo. The basic CFR is not efficient enough for Texas Holdem (but it can deal with Leduc Hold'em well). There are lots of variants which you can find by simply google CFR. The second category is based on reinforcement learning. We have implemented NFSP from this category. It usually does not perform as well as CFR in small games, but it is more efficient in large games. I believe through careful tuning (hyperparameters, state representation, action design, and reward design), NFSP could have good performance in Texas Hold'em. We did not fine-tune these kinds of stuff in this repo since our focus is to provide an easy-to-use environment.
More players would make the game much more difficult. I would suggest starting from more players in Leduc Hold'em.
We have some examples in /examples
on how to start training and save models. It usually has good performance on Leduc Hold'em. For other games, it has some improvement but needs further tuning. Yes, we support multi-process. But due to the recent update of interfaces, it has some issues (we are working on these). We do not support database.
There are several ways to know whether the agent is improving. The easiest way is to launch tournaments with a random agent, which is the default setting in examples
. This will give a sense of whether the agent is improving, but it can not be a formal evaluation metric. A usually adopted metric in the publication is exploitability (we are implementing this, some issues remained). This metric measures the weakness of the agent. This metric is accurate, however, hard to use in large games since computing this metric itself is too expensive in large games. We also support a third way. We have implemented several rule-agents in /models
. We can know the performance by launching tournaments with the rule-based bot. Also, we can use the human interface to analyze the behaviour.
Rule agents may mimic human behaviour, since it is designed based on human knowledge.
Yes, the templates are in another place to make rlcard light-weighted. It is in here https://github.com/rlcard/rlcard.github.io
For the document, I would recommend the ones in /doc
in this repo. We are updating the documentation. The one on the website is a little bit old.
Hopefully, I addressed all of your questions.
@daochenzha Thank you very much for all the insightful information, that really helps.
rlcard
? I'm curious though... you say there are 2 algorithms, CFR & NFSP and I wonder... are they using an existing architecture from https://www.asimovinstitute.org/neural-network-zoo/ or it's just a new one?Anyway, I'll definitely give it a shot to the project... it's really caught my attention, really cool stuff, looks fun... I've wanted to create a go (something like alphago) bot for ages but never found the time to learn about it. It seems trying to solve these card games is a more "feasible" task :)
Thanks for the interest. CFR & NFSP are two basic algorithms that represent two different ideas to solve card games. CFR does not use neural networks and only deals with the tabular case. DeepCFR is a kind of combination of CFR and neural networks and CFR. NFSP uses neural networks. The network architecture could be in any form you mentioned in the link. To fully understand the idea of CFR, it would be better to refer to the original paper (with lots of maths): https://poker.cs.ualberta.ca/publications/NIPS07-cfr.pdf
To get a sense of reinforcement learning in card games, NFSP paper is a good starting point https://arxiv.org/abs/1603.01121
For neural networks, yes, we can perfectly continue training. However, for off-policy reinforcement learning algorithms, there is an issue. In addition to the weights of neural networks, algorithms like Deep-Q Learning need to maintain a replay buffer which stores lots of historical data. To fully continue training, we also need to save the data in the buffer and these data could be very large. Our idea of parallelization will mainly focus on data generation (since the simulation in the game engine often takes more time). Reinforcement learning algorithms can also be running in parallel. There are many papers studying parallel reinforcement learning which could be helpful.
Uploading more models will definitely be helpful. We have some pre-trained models and rule-models in /models
. These models can be directly imported and compared. Users could also upload their model here as a baseline for comparison in the future. For your point 5, I think your concern totally makes sense. By exposing bot to more heterogeneous data, it could generalize better. NFSP uses the idea of Fictitious self-play, that is, the agent is trained to play against its own average behavior. See https://arxiv.org/abs/1603.01121
Thanks again for your interest. Card games are very challenging. I believe there are a lot of things to be explored in card games.
Hi, first of all, I'm asking this as a total amateur in the ML field... other than some little course of university years ago about neural networks my knowledge about modern ML is pretty limited (not my area of expertise).
Thing is, I was researching about whether incomplete information games such Texas Holdem were solved nowadays and I've seen attempts such as libratus, pluribus and I've got curious... It's seems reinforcement learning is the trend to solve incomplete information games.
Ok, let's say I wanted to create a strong Texas Holdem using rlcard framework, here's some questions:
Really interesting project... I'll read the docs from start to end so please assume when you answer I've already done so :)
Offtopic question: I see the docs have been generated using sphinx's https://github.com/rtfd/sphinx_rtd_theme but I don't see any script available to generate the online docs locally... am I missing something? Is there any other repo that contains the scripts/templates?