The inputs of my model (inducing points) are 2-D, in contrast to the example snippet above. When the forward pass is called, the number of samples changes w.r.t. the true size of inducing points (instead of a [nsamples x 2] tensor, I am told that the input is a [2*nsamples x 2] tensor). Defining and training an equivalent ExactGPModel does not produce this behavior (size remains [nsamples x 2]).
Note that this does not result in an error message, but I am insecure about where the size mismatch originates from.
To reproduce
Code snippet to reproduce
import numpy as np
import torch
import gpytorch as gpy
################ data
n_dim = 2
x = np.tile(np.linspace(-np.pi, np.pi, 100), n_dim).reshape(n_dim,-1)
x[1] = np.concatenate([np.roll(x[1], 40)[::2], np.roll(x[1], 40)[::2]])
train_x = torch.tensor(x.T)
train_y = torch.sin(train_x[:,0]) + torch.sin(2+train_x[:,1]) + torch.randn(100)/10
################ define approximate model
class ApproximateGPModel(gpy.models.ApproximateGP):
def __init__(self, inducing_points):
variational_distribution = gpy.variational.CholeskyVariationalDistribution(inducing_points.size(0))
variational_strategy = gpy.variational.VariationalStrategy(self, inducing_points, variational_distribution, learn_inducing_locations=True)
super().__init__(variational_strategy)
self.mean_module = gpy.means.ConstantMean()
self.covar_module = gpy.kernels.ScaleKernel(
gpy.kernels.PeriodicKernel(eps=1, active_dims=torch.tensor([0]))
)
for d in range(1,inducing_points.size(-1)):
self.covar_module += gpy.kernels.ScaleKernel(
gpy.kernels.PeriodicKernel(eps=1, active_dims=torch.tensor([d]))
)
def forward(self, x):
print('calling forward pass, x size: ', x.size())
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpy.distributions.MultivariateNormal(mean_x, covar_x)
def train_approximate_gp(train_x, train_y):
model = ApproximateGPModel(train_x)
likelihood = gpy.likelihoods.GaussianLikelihood()
mll = gpy.mlls.VariationalELBO(likelihood, model, num_data=train_y.numel())
# fix period length to 2*pi
for k in model.covar_module.kernels:
k.base_kernel.raw_period_length = torch.nn.Parameter(torch.tensor([[2*np.pi]]), requires_grad=False)
optimizer = torch.optim.Adam(list(model.parameters()) + list(likelihood.parameters()), lr=0.1)
model.train()
likelihood.train()
for _ in range(100):
output = model(train_x)
loss = -mll(output, train_y)
loss.backward()
optimizer.step()
optimizer.zero_grad()
model.eval()
likelihood.eval()
return model, likelihood
model_approx, likelihood_approx = train_approximate_gp(train_x, train_y)
Stack trace/error message
calling forward pass, x size: torch.Size([200, 2])
Expected Behavior
calling forward pass, x size: torch.Size([100, 2])
System information
GPyTorch Version 1.9.1
PyTorch Version 1.13.1
Mac OS 13.5.2 (22G91)
Additional context
I am also unsure about the dimension that needs to be indicated when setting up the variational distribution: In different documentation pages, either dim 0 or dim -1 is used:
I've seen that my confusion might be due to some misunderstanding of the concept of inducing points, so closing this issue until I've done the reading.
🐛 Bug
I implemented an ApproximateGPModel as described in https://docs.gpytorch.ai/en/stable/examples/04_Variational_and_Approximate_GPs/SVGP_Regression_CUDA.html .
The inputs of my model (inducing points) are 2-D, in contrast to the example snippet above. When the forward pass is called, the number of samples changes w.r.t. the true size of inducing points (instead of a [nsamples x 2] tensor, I am told that the input is a [2*nsamples x 2] tensor). Defining and training an equivalent ExactGPModel does not produce this behavior (size remains [nsamples x 2]).
Note that this does not result in an error message, but I am insecure about where the size mismatch originates from.
To reproduce
Code snippet to reproduce
Stack trace/error message
Expected Behavior
System information
GPyTorch Version 1.9.1 PyTorch Version 1.13.1 Mac OS 13.5.2 (22G91)
Additional context
I am also unsure about the dimension that needs to be indicated when setting up the variational distribution: In different documentation pages, either dim 0 or dim -1 is used:
https://docs.gpytorch.ai/en/stable/examples/04_Variational_and_Approximate_GPs/SVGP_Regression_CUDA.html uses
variational_distribution = CholeskyVariationalDistribution(inducing_points.size(0))
but
https://docs.gpytorch.ai/en/stable/examples/04_Variational_and_Approximate_GPs/Approximate_GP_Objective_Functions.html uses
variational_distribution = gpytorch.variational.CholeskyVariationalDistribution(inducing_points.size(-1))
and I don't think inducing_points are transposed anywhere in the two snippets. Documentation for high-D input would be super useful here.