DiscreteVariablesTaskForce / DiscreteSamplingFramework

Python classes describing discrete variable sampling/proposals
Eclipse Public License 2.0
1 stars 0 forks source link

DiscreteSamplingFramework

Bayesian sampling over distributions of discrete variables.

This software is licensed under Eclipse Public License 2.0. See LICENSE for more details.

This software is property of University of Liverpool and any requests for the use of the software for commercial use or other use outside of the Eclipse Public License should be made to University of Liverpool.

O(logN) Parallel Redistribution (submodule in discretesampling/base/executor/MPI/distributed_fixed_size_redistribution) is covered by a patent - A. Varsi & S. Maskell, Method of Parallel Implementation in Distributed Memory Architectures, University of Liverpool, Patent Request GB2101274.5, 29 Jan 2021 - (filed here).

Copyright (c) 2023, University of Liverpool.

Installation

Requirements

Cloning and installing from github

Latest source code can be cloned with:

git clone https://github.com/DiscreteVariablesTaskForce/DiscreteSamplingFramework.git --recursive
cd DiscreteSamplingFramework

Package requirements can be installed with:

pip install -r requirements.txt

And the development version of the package cna be installed with:

pip install -e .

Variables and Distributions

Discrete Variables

Each example of these should, at minimum implement functions:

We may at some point in the future need to also include:

Discrete Variable Proposal Distributions

Proposal distributions, q(x'|x), for each variable type should be described. Each example of these should, at minimum implement functions:

For more efficient evaluation, optionally implement class methods:

Discrete Variable Initial Proposal Distributions

Similarly, distributions of initial proposals. q0(x) should be described. Each example of these should, at minimum implement functions:

Discrete Variable Target Distributions

Each example of these should, at minimum implement functions:

Algorithms

MCMC and SMC samplers for discrete variables are implemented. Working examples for the implemented variable types can be found in the examples directory.

MCMC sampler

Basic MCMC sampler which can be initialised with a variable type, target distribution and initial proposal (initial samples are drawn from this). Samples can then be drawn with the .sample function, which takes one argument, N, the number of iterations. An example using the decision tree variable type is shown below:

from discretesampling import decision_tree as dt
from discretesampling.algorithms import DiscreteVariableMCMC
from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np

data = datasets.load_wine()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.30,random_state=5)

a = 0.01
b = 5
target = dt.TreeTarget(a,b)
initialProposal = dt.TreeInitialProposal(X_train, y_train)

mcmcSampler = DiscreteVariableMCMC(dt.Tree, target, initialProposal)
samples = mcmcSampler.sample(N = 5000)

SMC sampler

Similary, a basic SMC sampler which can be initialised with a variable type, target distribution and initial proposal (initial samples are drawn from this and weights calculated appropriately). Samples can then be drawn with the sample function, which takes two arguments, N and P the number of iterations and number of particles. An example using the decision tree variable type is shown below:

from discretesampling import decision_tree as dt
from discretesampling.algorithms import DiscreteVariableSMC
from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np

data = datasets.load_wine()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.30,random_state=5)

a = 0.01
b = 5
target = dt.TreeTarget(a,b)
initialProposal = dt.TreeInitialProposal(X_train, y_train)

smcSampler = DiscreteVariableSMC(dt.Tree, target, initialProposal)
samples = smcSampler.sample(N=10,P=1000)