Benchmark and compare several large-scale Gaussian process regression methods in 1D, 2D, and 3D, including our implementation of the equispaced Fourier method (EFGP) described in https://arxiv.org/abs/2210.10210 and https://arxiv.org/abs/2305.11065 . We focus on posterior mean prediction (kernel ridge regression). We also generate figures and tables for the first of the above papers.
Authors: Philip R Greengard, Alex H Barnett, Manas Rachh.
git clone --recurse-submodules https://github.com/flatironinstitute/gp-shootout.git
This will also download install some submodule packages (currently: RLCM, FLAM). In addition your system must also have the following required dependencies:
Dependencies specific to methods:
startup.m
).To test the basic installation, start MATLAB from the top-level gp-shootout
directory (which will execute startup
), then within MATLAB type test_all
.
Advanced: to build then test all wrapped non-MATLAB methods:
1) make sure you can call python from matlab, eg via py.sys.version
, then from MATLAB run test_all_nonmatlab
.
If you did not start MATLAB from the top-level directory, then run startup
to add required paths and apply useful settings. You may need to addpath
to FINUFFT by hand if you forgot in startup.m
.
Look in drivers
for example scripts. You may try to run expt
for a demo.
All are run in MATLAB R2021a or R2021b unless stated. In order of appearance in paper:
equispaced_fourier_gps
run discr_figs.jl
in Julia. A couple of seconds runtime.drivers/fig_materr.m
, takes around 1 minute.results/philip/cond_number/cond_number.m
. Uses 1d data from efgp_tables
. Takes ~5 seconds without exact 1e4 condition number calculation. paper_results/efgp_tables
Data is subsampled from MAT files of size N=1e7 for each N. Time: about 1.5 hours for the full table, mostly due to reference solutions. Excluding Matern-3D examples, runtime is considerably faster (~10 minutes).paper_results/time_v_accuracy
: input data is of size N=1e5 and is the same as the efgp tables of that size. Reference solutions can take a long time. For example, Matern 3d reference takes ~3 hours. paper_results/c02
: performs GP regression on CO2 data set described here -- https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1419136. Time: ~15 seconds for l=5, 50, including reference solutions. paper_results/big_example
. To generate the data, run gen_data.m
, running this would require approximately 64 GB of memory. Set the data directory where you wish to store the data. To generate individual rows of the table, run big_example.m
by setting appropriate values for iNvals, isig, itol, and alg. For table 3, iNvals = [1,2,3,4,5] and isig = 1, and for table 4, isig = iNvals. Options for alg are EFGP
and FLAM
. Set the directory to read the data from and the directory for storing the results. The memory requirements for storing all results
is approximately 56 GB. Followed by this, run big_example_postproc.m
, set values for iNvals and
isig, and the directories where the data is stored and where the results are stored. Finally, the tex table for these results can be generated by running nicelatextable.m
. For convenience,
we have included the results generated for the paper in the results
folder inside the big_example
directory.getL()
into kernels
naive_gp
?data/*
algs/*
(Python via system calls from matlab)