Open RodrigoAVargasHdz opened 4 years ago
@RodrigoAVargasHdz This all looks fine to me for the most part. Note that for variational GPs, you'll also need to save the likelihood's state separately, as the likelihood doesn't exist on the model. Also, you don't really need to store the inducing points separately or anything -- those are parameters on the model as well, so exist in the model's state dict.
If this doesn't answer your questions, feel free to reopen!
Sorry, I see you were also having performance question issues. sklearn does not use variational inference -- are you having issues with exact GP performance matching as well?
Hi Jake, thanks for getting back to me.
I know sklearn does not use VI for GPs, however, I compared both vanilla GPs, GPytorch vs Sklearn, and still get a better result using the one from Sklearn. I uploaded both of my codes for the various GPs I am using, ig., sklearn GP, GPytorch vanilla, sparse and variational.
https://github.com/RodrigoAVargasHdz/GPytorch_PES.git
Thanks for all your help :)
On Tue, Aug 18, 2020 at 7:43 PM Jake Gardner notifications@github.com wrote:
Sorry, I see you were also having performance question issues. sklearn does not use variational inference -- are you having issues with exact GP performance matching as well?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cornellius-gp/gpytorch/issues/1243#issuecomment-675772298, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXHX4SBVMCN6N3QHUM4RPLSBMG2LANCNFSM4PXW6H7Q .
When I combine two Matern kernels or two RBF, the kerenels' parameters after training are exactly the same.
I defined my kernels as,
ScaleKernel(MaternKernel(nu=2.5,ard_num_dims=d)) + ScaleKernel(MaternKernel(nu=2.5,ard_num_dims=d))
Also, as I mentioned before, for a test set I can not achieve the accuracy of a GP model trained in sklearn. I tried to increase the precision of the model to double and run longer optimizations but the difference between both models is quite large.
@RodrigoAVargasHdz So, there was a lot of code in your repo and rather than combing through it to find the bug I just coded up a working version on your dataset myself:
Notebook_for_Rodrigo.ipynb.txt
The difference in error I get in the notebook is pretty small -- 0.0040
vs 0.0060
-- and the test NLLs are extremely close. The small difference is most likely explained by one of two sources:
Your sklearn model as coded doesn't learn a likelihood noise (and uses a constant 1e-10 jitter, equivalent to a likelihood noise of 1e-10), while GPyTorch by default has a noise_constraint=GreaterThan(1e-4)
in GaussianLikelihood
, which we default to primarily for numerical stability reasons. Note that you can actually see the numerical stability issues in sklearn that result from using such a tiny added diagonal: I had to clamp the predictive standard deviations at a positive value because some were <= 0.
sklearn takes like 30x longer to train in part because its using BFGS which is way less approximate than Adam, so you might get \epsilon better hyperparameters if you use PyTorch-BFGS for training.
📚 Documentation/Examples
Hi,
I've been trying to use variational GP (varGP) and sparse GP (spGP) for a data set that is semi-large, approx 20,000 points training. I've been having some problems since a GP trained in sklearn with only 3000 points gives a better result than a varGP with +1000 inducing points; all methods are tested with the same RBF kernel. The only difference is I normalize the data set for the GPytorch models and for the sklearn I don't.
I was wondering if you could guide me on my implementation of these three methods in Gpytorch. My code is 100% based on the online documentation, see the codes example below.
---------------------------------------------------------------------------------------
Vanilla GP
class GPModel(gpytorch.models.ExactGP): def init(self, train_x, train_y, likelihood): super(GPModel, self).init(train_x, train_y, likelihood) self.mean_module = gpytorch.means.ConstantMean() self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=51))
initialize GP model
optimizer
loss function
select best model during training
gp_file ---> file name
loss0 --> lowest -mll known value
load best model
prediction
---------------------------------------------------------------------------------------
Variational GP
class varGPModel(ApproximateGP): def init(self, inducing_points): variational_distribution = CholeskyVariationalDistribution(inducing_points.size(0)) variational_strategy = VariationalStrategy(self, inducing_points, variational_distribution, learn_inducing_locations=True) super(GPModel, self).init(variational_strategy) self.mean_module = gpytorch.means.ConstantMean() self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=51))
initialize model,
n_inducing_points ---> number of inducing points
optimizer
loss function
save best model during training
gp_file ---> file name
loss0 --> lowest -mll known value
load best model
prediction
---------------------------------------------------------------------------------------
sparse GP
class sparseGPModel(gpytorch.models.ExactGP): def init(self,x_inducing_points,train_x, train_y, likelihood): super(GPModel, self).init(train_x, train_y, likelihood) self.mean_module = gpytorch.means.ConstantMean() self.base_covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=51)) self.covar_module = gpytorch.kernels.InducingPointKernel(self.base_covar_module, inducing_points=x_inducing_points, likelihood=likelihood)
initialize model,
n_inducing_points ---> number of inducing points
optimizer
loss function
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
save best model during training
gp_file ---> file name
loss0 --> lowest -mll known value
load best model
prediction