Morgan-Griffiths / PokerAI

1 poker program to rule them all
MIT License
4 stars 4 forks source link

PokerAI

A combination of poker environment simulator and a bitwise Omaha hand winner evaluator written in Rust.

Bash scripts

are meant to be executed from the /poker folder

Requirements

There are two requirement files, one for pip and one for conda. pip install requirements.txt or if using conda conda config --add channels conda-forge conda create --name <env> --file conda_requirements.txt

To build the Omaha Evaluator

to build rust code, cd into rusteval and run

cargo build --release

If you don't have rust

Ubuntu

sudo apt install cargo

OSX

brew install rust

MongoDB

Is used for storing the RL training run data and generating plots.

Ubuntu

https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/

OSX

https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/

Testing

cd src

test the environment

python env_test.py

test the backend server

python -m unittest tests/server_tests.py

Abstract

A series of poker environments that cover each of the individual complexities of poker, allowing one to test networks and learning architectures quickly and easily starting from the simpliest env all the way to real world poker. The goal is a single API that interfaces with the training architecture for the agent such that you can scale the complexity as needed. Asserting that the learning algorithm learns at all stages.

Additionally there is a sub library in hand_recognition if you want to test networks on ability and efficacy of understanding hand board relationships

Using the library

cd src

Build the data and all the folders by python setup.py

Build a specific dataset with python build_dataset.py -d <dataset>

Modify poker/models/network_config.py to change which network to train. Add or modify poker/models/networks.py to try different models.

Train a network for 10 epochs (loaded from the network_config) on a dataset with python evaluate_card_models.py -d <dataset> -M train -e 10 Examine a network's output (loaded from the network_config) on a dataset with python evaluate_card_models.py -d <dataset> -M examine Train an RL agent on an env with python main.py --env <environment> -e <epochs> Plot the RL training results with python visualize.py

Hand recognition

To build all the datasets run

python setup.py

to train a network on a dataset

python evaluate_card_models.py -d <dataset> -M train -e 10

to examine a trained network

python evaluate_card_models.py -d <dataset> -M examine

Poker Environments

There are a number of environments, each increasing in complexity.

Kuhn

Simple Kuhn

SB Options:

BB Options: facing bet only

Solution:

SB

Baseline performance

Graph

BB

Baseline performance

Graph

Complex Kuhn

SB Options:

BB Options:

Solution:

SB

Baseline performance

Graph

BB

Baseline performance

Graph

Limit holdem with full deck

Added betsize

The important part about betsizing is that if i break actions into categories and then within those categories you can choose sizing. Improper sizing will result in the category not being chosen as often. Conversely, if i use a critic, the critic must be able to take an action an a betsize. Ideally you update both against the betsize and the action not just the action category. Additionally its important to be able to have mixed strategies. So either gaussian or descrete categorical output for betsize is also preferred. such that different categories can be reinforced.

Additional levels to network that outputs a analog value, which is a % of pot.

Will test initially two sizes 0.5p and 1p along with check,fold etc. All as a categorical output with the action space descretized. Then scale up to something like 100 descretized.

Multiple streets

Dealing with histories. Record only actions and game situations? or include board and hands.

Full game

Possibilities: MuZero-esque. Dynamics Model (samples outcomes), Villain model (predicts opponents actions), Predicting next card.