An open source package in Python for solving large scale ridge regression using the sketch-and-project technique.
For details, see the RidgeSketch paper.
RidgeSketch aims to match the Scikit-learn API:
n_samples, n_features = 1000, 500
X = np.random.rand(n_samples, n_features)
y = np.random.rand(n_samples, 1)
model = RidgeSketch(
alpha=1e-1,
solver="subsample",
sketch_size=10,
verbose=1,
)
model.fit(X, y)
First ensure you're in a Python 3 virtual environment (see instructions below).
To install the package and requirements:
pip install -e .
Create a Python 3 virtual environment.
For Unix or MacOS this can be done by executing: source activate [env name]
. Then,
python3 -m venv ridgesketch-env
source activate ridgesketch-env/bin/activate
Tutorial notebooks for running and adding new sketches are in the tutorials
subdirectory.
@misc{gazagnadou2021textttridgesketch,
title={$\texttt{RidgeSketch}$: A Fast sketching based solver for large scale ridge regression},
author={Nidham Gazagnadou and Mark Ibrahim and Robert M. Gower},
year={2021},
eprint={2105.05565},
archivePrefix={arXiv},
primaryClass={math.OC}
}
Please visit our documentation for API details.
To run benchmarks:
benchmark_configs.py
(see small
for an example)python benchmarks.py [options] [name of config]
For example to run benchmarks with the small configs: python benchmarks.py small
For a full list of options see:
Usage: benchmarks.py [OPTIONS] CONFIG_NAME
Options:
--folder PATH folder path where results are saved
--n-repetitions INTEGER number of times to rerun benchmarks
--save / --no-save
--help Show this message and exit.
To add a dataset:
datasets/data_loaders.py
and create a new Dataset subclassTo create your own sketching method, inherit from Sketch
and implement sketch()
and update_iterate()
:
from sketching import Sketch
class MySketch(Sketch):
def __init__(self, A, b, sketch_size):
super().__init__(A, b, sketch_size)
def sketch(self, r):
"""Returns a tuple of (SA, SAS, rs)"""
pass
def update_iterate(self, w, lmbda, step_size=1.0):
"""Returns updated weights"""
pass
See the CONTRIBUTING file for how to help out.
See LICENSE file.