RuBP17 / AlphaDou

A Doudizhu reinforcement learning AI
GNU General Public License v3.0
4 stars 1 forks source link
ai doudizhu game imperfect-information-games poker reinforcement-learning

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

AlphaDou is a reinforcement learning framework for DouDizhu (斗地主), the most popular card game in China.

Deep Monte Carlo framework modified from the open source project DouZero ResNet.

Compared to the framework provided by the open source project DouZero, the buffers part has been removed.

The framework introduces a bidding phase, which allows RL models to be trained in realistic landlord environments.

The trained model(Card Model) vs. the open source DouZero(ADP) model has a win rate of 61.7%, reaching the state-of-the-art.

Logo

Training

To use GPU for training, run

python3 train.py

This will train AlphaDou on one GPU. To train AlphaDou on multiple GPUs. Use the following arguments.

For example, if we have 4 GPUs, where we want to use the first 3 GPUs to have 15 actors each for simulating and the 4th GPU for training, we can run the following command:

python3 train.py --gpu_devices 0,1,2,3 --num_actor_devices 3 --num_actors 15 --training_device 3

To use CPU training or simulation, use the following arguments:

For example, use the following command to run everything on CPU:

python3 train.py --actor_device_cpu --training_device cpu

The following command only runs actors on CPU:

python3 train.py --actor_device_cpu

Evaluation

The evaluation can be performed with GPU or CPU (GPU will be much faster). The performance is evaluated through self-play. We have provided pre-trained models and some heuristics as baselines: For the bidding phase, the following agendas are provided here for testing:

For the cardplay phase, the following agendas are provided here for testing:

Step 1: Generate evaluation data

python3 generate_eval_data.py

Some important hyperparameters are as follows.

Step 2: Self-Play

python3 evaluate.py

Some important hyperparameters are as follows.

Evaluate while training

auto_test.py: can be used to automatically test new models while training.

python3 auto_test.py