A combination of poker environment simulator and a bitwise Omaha hand winner evaluator written in Rust.
are meant to be executed from the /poker folder
There are two requirement files, one for pip and one for conda.
pip install requirements.txt
or if using conda
conda config --add channels conda-forge
conda create --name <env> --file conda_requirements.txt
to build rust code, cd into rusteval and run
cargo build --release
If you don't have rust
sudo apt install cargo
brew install rust
Is used for storing the RL training run data and generating plots.
https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/
https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/
cd src
test the environment
python env_test.py
test the backend server
python -m unittest tests/server_tests.py
A series of poker environments that cover each of the individual complexities of poker, allowing one to test networks and learning architectures quickly and easily starting from the simpliest env all the way to real world poker. The goal is a single API that interfaces with the training architecture for the agent such that you can scale the complexity as needed. Asserting that the learning algorithm learns at all stages.
Additionally there is a sub library in hand_recognition if you want to test networks on ability and efficacy of understanding hand board relationships
cd src
Build the data and all the folders by python setup.py
Build a specific dataset with python build_dataset.py -d <dataset>
Modify poker/models/network_config.py to change which network to train. Add or modify poker/models/networks.py to try different models.
Train a network for 10 epochs (loaded from the network_config) on a dataset with python evaluate_card_models.py -d <dataset> -M train -e 10
Examine a network's output (loaded from the network_config) on a dataset with python evaluate_card_models.py -d <dataset> -M examine
Train an RL agent on an env with python main.py --env <environment> -e <epochs>
Plot the RL training results with python visualize.py
To build all the datasets run
python setup.py
to train a network on a dataset
python evaluate_card_models.py -d <dataset> -M train -e 10
to examine a trained network
python evaluate_card_models.py -d <dataset> -M examine
There are a number of environments, each increasing in complexity.
SB Options:
BB Options: facing bet only
SB
Baseline performance
BB
Baseline performance
SB Options:
BB Options:
SB
Baseline performance
BB
Baseline performance
The important part about betsizing is that if i break actions into categories and then within those categories you can choose sizing. Improper sizing will result in the category not being chosen as often. Conversely, if i use a critic, the critic must be able to take an action an a betsize. Ideally you update both against the betsize and the action not just the action category. Additionally its important to be able to have mixed strategies. So either gaussian or descrete categorical output for betsize is also preferred. such that different categories can be reinforced.
Additional levels to network that outputs a analog value, which is a % of pot.
Will test initially two sizes 0.5p and 1p along with check,fold etc. All as a categorical output with the action space descretized. Then scale up to something like 100 descretized.
Dealing with histories. Record only actions and game situations? or include board and hands.
Possibilities: MuZero-esque. Dynamics Model (samples outcomes), Villain model (predicts opponents actions), Predicting next card.