cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.46k stars 545 forks source link

FEA: Parallel Partial Emulation (aka Multitask GPs with a shared kernel) #2470

Open PieterjanRobbe opened 5 months ago

PieterjanRobbe commented 5 months ago

This pull requests adds parallel partial emulation (see Gu and Berger, 2016). The idea of this method is that, in a multi-output (or multitask) setting, each task is allowed to have a different mean and variance, but all tasks share common Gaussian process correlation parameters, which are estimated from the joint likelihood.

This is useful when constructing Gaussian process surrogate for a computer model with massive amounts of outputs. Think, for example, of a finite element model for an engineering or science problem, where the inputs are model parameters and the outputs are the model predictions at a large number of space and/or time coordinates.

I think this is a setting currently not yet covered by gpytorch:

This PR should cover that gap. Feedback and suggestions are welcome!

Code changes

I've added a new ParallelPartialKernel, that acts as a drop-in replacement for MultitaskKernel and implements the parallel partial emulation strategy from Gu and Berger, 2016).

Tests

See test/examples/test_parallel_partial_gp_regression.py.

Documentation

Has been updated accordingly.

Examples

See this notebook for an example. I've also added this file to the documentation.

Comparison to existing methods

Ignoring the inter-task correlations leads to a (much) faster method. This notebook compares the cost of evaluating the posterior with both Multitask GP and Parallel Partial GP regression as a function of the number of tasks / number of outputs. As the picture below illustrates, the multitask GP method would be infeasible in applications where the number of outputs is large (say, more than several hundreds or thousands).

download

The method is also faster than a batch-independent GP construction (see this notebook) and has the additional benefit that only one set of kernel parameters needs to be trained (instead of num_tasks sets of parameters).