georgedouzas / sports-betting

Collection of sports betting AI tools.
https://georgedouzas.github.io/sports-betting
MIT License
390 stars 70 forks source link
machine-learning python scikit-learn sports-analytics sports-betting

sports-betting

ci doc

Category Tools
Development black ruff mypy docformatter
Package version pythonversion downloads
Documentation mkdocs
Communication gitter discussions

Introduction

Python sports betting toolbox.

The sports-betting package is a collection of tools that makes it easy to create machine learning models for sports betting and evaluate their performance. It is compatible with scikit-learn.

The main components of sports-betting are dataloaders and bettors objects.

Quick start

sports-betting supports all common sports betting needs i.e. fetching historical and fixtures data as well as backtesting of betting strategies and prediction of value bets. Assume we would like to backtest the following scenario and use the bettor object to predict value bets:

# Selection of data
from sportsbet.datasets import SoccerDataLoader

leagues = ['Germany', 'Italy', 'France']
divisions = [1, 2]
years = [2021, 2022, 2023, 2024]
odds_type = 'market_maximum'
dataloader = SoccerDataLoader({'league': leagues, 'year': years, 'division': divisions})
X_train, Y_train, O_train = dataloader.extract_train_data(odds_type=odds_type)
X_fix, _, O_fix = dataloader.extract_fixtures_data()

# Configuration of betting strategy
from sklearn.model_selection import TimeSeriesSplit
from sklearn.compose import make_column_transformer
from sklearn.linear_model import LogisticRegression
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.multioutput import MultiOutputClassifier
from sportsbet.evaluation import ClassifierBettor, backtest

tscv = TimeSeriesSplit(5)
init_cash = 10000.0
stake = 50.0
betting_markets = ['home_win__full_time_goals', 'draw__full_time_goals', 'away_win__full_time_goals']
classifier = make_pipeline(
  make_column_transformer(
    (OneHotEncoder(handle_unknown='ignore'), ['league', 'home_team', 'away_team']), remainder='passthrough'
  ),
  SimpleImputer(),
  MultiOutputClassifier(LogisticRegression(solver='liblinear', random_state=7, class_weight='balanced', C=50)),
)
bettor = ClassifierBettor(classifier, betting_markets=betting_markets, stake=stake, init_cash=init_cash)

# Apply backtesting and get results
backtesting_results = backtest(bettor, X_train, Y_train, O_train, cv=tscv)

# Get value bets for upcoming betting events
bettor.fit(X_train, Y_train)
bettor.bet(X_fix, O_fix)

Sports betting in practice

You can think of any sports betting event as a random experiment with unknown probabilities for the various outcomes. Even for the most unlikely outcome, for example scoring more than 10 goals in a soccer match, a small probability is still assigned. The bookmaker estimates this probability P and offers the corresponding odds O. In theory, if the bookmaker offers the so-called fair odds O = 1 / P in the long run, neither the bettor nor the bookmaker would make any money.

The bookmaker's strategy is to adjust the odds in their favor using the over-round of probabilities. In practice, it offers odds less than the estimated fair odds. The important point here is that the bookmaker still has to estimate the probabilities of outcomes and provide odds that guarantee them long-term profit.

On the other hand, the bettor can also estimate the probabilities and compare them to the odds the bookmaker offers. If the estimated probability of an outcome is higher than the implied probability from the provided odds, then the bet is called a value bet.

The only long-term betting strategy that makes sense is to select value bets. However, you have to remember that neither the bettor nor the bookmaker can access the actual probabilities of outcomes. Therefore, identifying a value bet from the side of the bettor is still an estimation. The bettor or the bookmaker might be wrong, or both of them.

Another essential point is that bookmakers can access resources that the typical bettor is rare to access. For instance, they have more data, computational power, and teams of experts working on predictive models. You may assume that trying to beat them is pointless, but this is not necessarily correct. The bookmakers have multiple factors to consider when they offer their adjusted odds. This is the reason there is a considerable variation among the offered odds. The bettor should aim to systematically estimate the value bets, backtest their performance, and not create arbitrarily accurate predictive models. This is a realistic goal, and sports-betting can help by providing appropriate tools.

Installation

For user installation, sports-betting is currently available on the PyPi's repository, and you can install it via pip:

pip install sports-betting

Development installation requires to clone the repository and then use PDM to install the project as well as the main and development dependencies:

git clone https://github.com/georgedouzas/sports-betting.git
cd sports-betting
pdm install

Usage

You can use the Python API or the CLI to access the full functionality of sports-betting. Nevertheless, it is recommended to be familiar with the Python API since it is still needed to write configuration files for the CLI.

API

The sports-betting package makes it easy to download sports betting data:

from sportsbet.datasets import SoccerDataLoader
dataloader = SoccerDataLoader(param_grid={'league': ['Italy'], 'year': [2020]})
X_train, Y_train, O_train = dataloader.extract_train_data(odds_type='market_maximum')
X_fix, Y_fix, O_fix = dataloader.extract_fixtures_data()

X_train are the historical/training data and X_fix are the test/fixtures data. The historical data can be used to backtest the performance of a bettor model:

from sportsbet.evaluation import ClassifierBettor, backtest
from sklearn.dummy import DummyClassifier
bettor = ClassifierBettor(DummyClassifier())
backtest(bettor, X_train, Y_train, O_train)

We can use the trained bettor model to predict the value bets using the fixtures data:

bettor.fit(X_train, Y_train)
bettor.bet(X_fix, O_fix)

CLI

The command sportsbet provides various sub-commands to download data and predict the value bets. For any sub-command you may add the --help flag to get more information about its usage.

Configuration

In order to use the commands, a configuration file is required. You can find examples of such configuration files in sports-betting/configs/. The configuration file should have a Python file extension and contain a few variables. The variables DATALOADER_CLASS and PARAM_GRID are mandatory while the rest are optional.

The following variables configure the data extraction:

The following variables configure the betting process:

Commands

Once these variables are provided, we can select the appropriate commands to select any of the sports-betting's functionalities.

Dataloader

Show available parameters for dataloaders:

sportsbet dataloader params -c config.py

Show available odds types:

sportsbet dataloader odds-types -c config.py

Extract training data and save them as CSV files:

sportsbet dataloader training -c config.py -d /path/to/directory

Extract fixtures data and save them as CSV files:

sportsbet dataloader fixtures -c config.py -d /path/to/directory
Bettor

Backtest the bettor and save the results as CSV file:

sportsbet bettor backtest -c config.py -d /path/to/directory

Get the value bets and save them as CSV file:

sportsbet bettor bet -c config.py -d /path/to/directory