Currently, we only take advantage of savings for computing the kernel gram matrix, and computing a gram solve in GP predictions.
We neglect:
Efficiency of matrix solves and log-determinants when computing the Gaussian pdf function, or drawing samples (which use a dense lower-Cholesky decomposition) via the current implementation. We do not take full advantage of the matrix structure. At best we can supply a pre-computed Cholesky decomposition to a distrax Multivariate Normal.
Structure for KL divergences between two multivariate Gaussians (important for efficiency in SVGP).
Memory savings, where we use .to_dense() methods on the linear operator in the codebase.
Suggestion would be to create an abstract Gaussian distribution class with its covariance function defined as a linear operator. We could then define efficient schemes for computing the pdf, sampling and calculating KL divergences.
Currently, we only take advantage of savings for computing the kernel gram matrix, and computing a gram solve in GP predictions.
We neglect:
distrax
Multivariate Normal..to_dense()
methods on the linear operator in the codebase.Suggestion would be to create an abstract Gaussian distribution class with its covariance function defined as a linear operator. We could then define efficient schemes for computing the pdf, sampling and calculating KL divergences.