dev: Take full advantage of linear operators and kernel computation.

Currently, we only take advantage of savings for computing the kernel gram matrix, and computing a gram solve in GP predictions.

We neglect:

Efficiency of matrix solves and log-determinants when computing the Gaussian pdf function, or drawing samples (which use a dense lower-Cholesky decomposition) via the current implementation. We do not take full advantage of the matrix structure. At best we can supply a pre-computed Cholesky decomposition to a distrax Multivariate Normal.
- Structure for KL divergences between two multivariate Gaussians (important for efficiency in SVGP).
- Memory savings, where we use .to_dense() methods on the linear operator in the codebase.

Suggestion would be to create an abstract Gaussian distribution class with its covariance function defined as a linear operator. We could then define efficient schemes for computing the pdf, sampling and calculating KL divergences.

JaxGaussianProcesses / JaxLinOp

dev: Take full advantage of linear operators and kernel computation. #26