Closed maximerischard closed 5 years ago
Thanks @maximerischard for making a start on this. @fairbrot and I are meeting for lunch tomorrow so we'll both take a look and give some feedback tomorrow.
Thank you for having a look Chris. I think yes, something like FITC would look similar. I'll implement it too once we've settled on the interface.
Regarding autodiff, so far what I've looked into is using autodiff for elements of the covariance matrix. My assumption has been that going from covariance to likelihood, we would continue to use analytical derivatives. Are you thinking autodiff could be useful there too?
I agree this is the right time to think about function names. I also agree that a proliferation of GPxxx methods would end up leading to confusion. Also keep in mind that I'm hoping these approximations will be useful for GPMC as well, so this isn't just a replacement for GPE.
I've thought a bit more about the awkwardness of having important parameters (the inducing points) stored in the AbstractPDMat object. An alternative would be to define an AbstractCovarianceStrategy
type, and then subtypes like FullCovariance
and SubsetOfRegressors
. These strategies would be the first parameter of methods like update_cK
and predict
, and would also be stored in the GP
objects. FullCovariance
would be an empty shell, or maybe a few settings that dictate how KernelData
is obtained. SubsetOfRegressors
would contain the vector of inducing points. This also would allow for different strategies using the same subtype of AbstractPDMat
. What do you think?
I like the idea of different subtypes for AbstractPDMat
, @fairbrot - what do you think? Regarding the naming convention and the interface, I wonder if the best option might to create separate functions for FITC, etc. in the same way as you have SubsetOfRegressors, i.e. don't worry about the naming convention at the moment. I recall that you already have most of the code for these other approximations. Once the code is implemented and working it might be easier for us to then see ways of i) simplifying the code by removing common repetitions amongst sparse GP functions, ii) what similarities there are with GPE
. Also, we should probably look at other packages to see if we can get some inspiration with how to handle these functions and what might be a sensible naming convention.
This is getting closer to being ready to merge. I now have implemented subset of regressors (SoR), deterministic regression conditionals (DTC), and fully independent regression conditionals (FITC) approximations. I've implemented gradients to the marginal likelihood for all of them, thought not with respect to the inducing points. The changes I've made to GPMC are also quite hacky, and the approximations are sadly not yet compatible with MCMC.
Excellent! Looking forward to seeing these additions included in the package.
I've just finished implementing a very basic version of the Subset of Regressors (SoR) sparse approximation to GPs. But before I go any further with this, I want to discuss whether this is the right way to implement sparse approximations.
The idea is that sparse approximations will be represented as a special type of positive definite matrix. This makes some sense, as operations involving the covariance matrix are really where the sparsity comes in. One would also hope that this would play nicely with the MCMC approach.
Ideally, one would be able to just define the required methods for AbstractPDMat, like left-division and
whiten!
, just like @jbrea did for hisElasticPDMat
, and then everything would just work. In fact for SoR I found that that implementing\
andlogdet
was enough to compute the marginal loglikelihood. But for predictions there are three problems:whiten!
, which is required for predictions._predict
.I have not yet looked into computing derivatives of the marginal loglikelihood, which will also require some thought.
So the
AbstractPDMat
abstraction only took me so far. The other thing that bothers me is that the inducing points are stored inside the PDMat object, along with some precomputed matrices. To me this feels frail, as it means the sparse covariance matrix cannot be obtained from scratch, and state is distributed amongst various objects in a way that might be confusing to future developers.In short, this does work, but I want to get some ideas and thoughts from you before I dive in any further.
An example:
(Obviously in this case it's a pretty bad approximation, but that's beside the point!)