kernel design - Githubissues

Sabina321 commented 5 years ago

Hi,

This is related to another issue so it might make more sense to join it with the other thread.

I'm trying to make this linear kernel: x1^T (Sigma) x2 where Sigma is a d x d matrix some of whose entries I want to learn. In particular Sigma has the form [[s1^2,rhos1s2],[rhos1s2,s2^2]]. Is there a way to create a linear kernel where instead of v*x1^T(x2) the variance is a matrix, where the parameters of the matrix are constrained in certain ways? Or, is there a way to select different active dimensions from each vector? Right now it seems that the active dimensions will be the same for both x1 and x2?

Thanks so much for all of the help!

gpleiss commented 5 years ago

Hi @Sabina321,

There are a few ways you could accomplish this. The easiest way would probably be to implement your own linear kernel. All you need to do is to define a forward method that performs the operation you're describing. In the initializer, you would define the parameters you want to learn.

See the docs here on kernels: https://gpytorch.readthedocs.io/en/latest/kernels.html#kernel

Hope this helps!

Sabina321 commented 5 years ago

Hi @gpleiss that sounds doable. And it looks like I don't need to define the gradients? Thanks for the help, Sabina

gpleiss commented 5 years ago

Yeah - no need to define the gradients! The forward method does everything.

Sabina321 commented 5 years ago

Hi, thank you. I have implemented that custom kernel however, it seems that the values I'm obtaining are a bit different than those I get from other code.

I have a question about the noise term in the likelihood. I'm using a GaussianLikelihood. What is the difference between the raw_noise and the noise? If I want to extract the learned noise from the likelihood which should I trust?

If I want to set one of these terms and not update its value in training should I set: likelihood.noise_covar.raw_noise.requires_grad=False or likelihood.noise_covar.noise.requires_grad=False or both? Thanks!

gpleiss commented 5 years ago

Sorry for the slow reply! noise is constrained to be a positive value (you can't have negative variances). To accomplish this, we introduce the unconstrained raw_noise parameter which is transformed into the noise parameter (typically using the softplus function: e.g. noise = softplus(raw_noise)).

Long story short: likelihood.noise_covar.raw_noise.requires_grad=False is probably what you want :)

With regards to the kernel function, could you share your implementation?

Sabina321 commented 5 years ago

Thanks for the reply!

Ok, so if I need the noise term it looks like I can use the final likelihood.noise_covar.noise instead of likelihood.noise_covar.raw_noise? It seems to give me a more realistic term if I keep requires_grad=True.

Here is the implementation of the kernel function. Right now I would like to put it in a GridInterpolationKernel to make it more efficient. To do so it seems like the forward method has to return a input_dimension number of data points number of data points size tensor? So here I'm working with 100 data points and it seems like it needs to return a 4100100 tensor rather than 100*100? I tried stacking(final = torch.stack((tone,ttwo,tone,ttwo),dim=0)) the dimension specific terms and I'm getting an error. Any help here would be much appreciated!

def __init__(self, num_dimensions,user_mat, first_mat, variance_prior=None, offset_prior=None, active_dims=None):
    super(MyKernel, self).__init__(active_dims=active_dims)
    self.register_parameter(name="u1", parameter=torch.nn.Parameter(1*torch.ones(1)))
    self.register_parameter(name="u2", parameter=torch.nn.Parameter(1*torch.ones(1)))
    self.register_parameter(name="rho", parameter=torch.nn.Parameter(1*torch.ones(1)))
    self.user_mat = user_mat
    self.first_mat = first_mat
    self.noise_mat = torch.eye(100)

    self.register_prior("u1_prior", gpytorch.priors.SmoothedBoxPrior(a=0,b=2,sigma=1), "u1")
    self.register_prior("u2_prior", gpytorch.priors.SmoothedBoxPrior(a=0,b=2,sigma=1), "u2")
    self.register_prior("rho_prior", gpytorch.priors.SmoothedBoxPrior(a=0,b=2,sigma=.5), "rho")

def forward(self, x1, x2, batch_dims=None, **params):

    rho_ = self.rho
    prod = MatmulLazyTensor(x1[:,:,0].transpose(1, 0), x2[:,:,0])

    tone = prod * (self.u1)

    prod = MatmulLazyTensor(x1[:,:,1].transpose(1, 0), x2[:,:,1])

    ttwo = prod * (self.u2)

    diagone = MatmulLazyTensor(x1[:,:,0].transpose(1, 0), x2[:,:,1])

    diagtwo = MatmulLazyTensor(x1[:,:,1].transpose(1, 0), x2[:,:,0])

    tthree = (diagone+diagtwo)*((self.rho-1)*(self.u1)**.5*(self.u2)**.5)

    effects = tone+ttwo+tthree

    effects = effects.mul(self.user_mat)

    final = effects + self.first_mat

    return final.expand(1,100,100)

class GPRegressionModel(gpytorch.models.ExactGP): def init(self, train_x, train_y, likelihood): super(GPRegressionModel, self).init(train_x, train_y, likelihood)

    self.mean_module = gpytorch.means.ZeroMean()

    self.covar_module =  MyKernel(4,user_mat=user_mat,first_mat=user_mat)

def forward(self, x):
    mean_x = self.mean_module(x)
    covar_x = self.covar_module(x)
    return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

likelihood = gpytorch.likelihoods.GaussianLikelihood()
likelihood.noise_covar.initialize(noise=torch.ones(1))

gpleiss commented 5 years ago

See my response to #600. I don't think you're going to be able to use GridInterpolationKernel with your custom LinearKernel, since it violates the assumption of GridInterpolation.

However, LinearKernels (especially as you have started writing it, with MatmulLazyTensor) will be EXTREMELY efficient. I would try writing it without GridInterpolationKernel - so it should return a num_data * num_data LazyTensor object.

Two notes from your implementation: it's better to use negative indices for the transpose (this makes it more safe with batches). I would also initialize the rho parameter to be scalar tensors.

Here's how I would implement it (see if this is what you're trying to do):

def __init__(self, num_dimensions,user_mat, first_mat, variance_prior=None, offset_prior=None, active_dims=None):
    super(MyKernel, self).__init__(active_dims=active_dims)
    self.register_parameter(name="u1", parameter=torch.nn.Parameter(torch.ones(1)))
    self.register_parameter(name="u2", parameter=torch.nn.Parameter(torch.ones(1)))
    self.register_parameter(name="rho", parameter=torch.nn.Parameter(torch.tensor(1.)))
    self.user_mat = user_mat
    self.first_mat = first_mat
    self.noise_mat = torch.eye(100)

def forward(self, x1, x2, batch_dims=None, **params):
    us = torch.cat([self.u1, self.u2], 0) # us is a vector of size 2
    x1 = x1 * us
    x2 = x2 * us

    prod = MatmulLazyTensor(x1.transpose(-1, -2), x2) # This gives us the variance matrix [u1^2, u1u2; u1u2, u2^2]

    effects = prod * self.rho + DiagLazyTensor(prod.diag() * (1 - self.rho)) # This gives us the variance matrix [u1^2, rho * u1u2; rho * u1u2, u2^2]

    effects = effects.mul(self.user_mat)

    final = effects + self.first_mat

    return final

(I just wrote this in the github text editor, so no guarantees that it will run as is).

Sabina321 commented 5 years ago

Thank you for the help! That looks awesome and I will try it out.

In another post it was mentioned that you are working on having hard constraints for parameter values. Is there a way for me to be notified when that happens? The prior, self.register_prior("u2_prior", gpytorch.priors.SmoothedBoxPrior(a=0,b=2,sigma=1), "u2") is meant to ensure that u2 is positive, but rho also has to be constrained to be within a range.

jacobrgardner commented 5 years ago

@Sabina321 hard constraint support is now on master! For any parameter on a gpytorch.Module, you can now use module.register_constraint("same_name_as_in_register_parameter", SomeConstraint) where SomeConstraint is a constraint object in gpytorch.constraints.

For the moment, be sure to initialize the parameter values to something sensible given the constraint. I'm hoping to today have a PR up that adds initialization routines to the constraints that can do some of this automatically.

Sabina321 commented 5 years ago

Thank you!

I registered the constraints. However, now I'm getting a new error which wasn't happening before:

~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in _mul_matrix(self, other) 485 486 self = self.evaluate_kernel() --> 487 other = other.evaluate_kernel() 488 if isinstance(self, NonLazyTensor) or isinstance(other, NonLazyTensor): 489 return NonLazyTensor(self.evaluate() * other.evaluate())

AttributeError: 'Tensor' object has no attribute 'evaluate_kernel'

jacobrgardner commented 5 years ago

Can you post the full stack trace

Sabina321 commented 5 years ago

Yeah it just happened after updating. That line was working before.

AttributeError Traceback (most recent call last)

in

in train(num_iter) 19 optimizer.zero_grad() 20 output = model(newx) ---> 21 loss = -mll(output, newy) 22 loss.backward() 23 #print('Iter %d/%d - Loss: %.3f' % (i + 1, num_iter, loss.item())) ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/module.py in __call__(self, *inputs, **kwargs) 20 21 def __call__(self, *inputs, **kwargs): ---> 22 outputs = self.forward(*inputs, **kwargs) 23 if isinstance(outputs, list): 24 return [_validate_module_outputs(output) for output in outputs] ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py in forward(self, output, target, *params) 26 # Get the log prob of the marginal distribution 27 output = self.likelihood(output, *params) ---> 28 res = output.log_prob(target) 29 30 # Add terms for SGPR / when inducing points are learned ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/distributions/multivariate_normal.py in log_prob(self, value) 127 128 # Get log determininat and first part of quadratic form --> 129 inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True) 130 131 res = -0.5 * sum([inv_quad, logdet, diff.size(-1) * math.log(2 * math.pi)]) ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in inv_quad_logdet(self, inv_quad_rhs, logdet, reduce_inv_quad) 990 from .chol_lazy_tensor import CholLazyTensor 991 --> 992 cholesky = CholLazyTensor(self.cholesky()) 993 return cholesky.inv_quad_logdet(inv_quad_rhs=inv_quad_rhs, logdet=logdet, reduce_inv_quad=reduce_inv_quad) 994 ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in cholesky(self, upper) 716 (LazyTensor) Cholesky factor (lower triangular) 717 """ --> 718 res = self._cholesky() 719 if upper: 720 res = res.transpose(-1, -2) ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs) 32 cache_name = name if name is not None else method 33 if not is_in_cache(self, cache_name): ---> 34 add_to_cache(self, cache_name, method(self, *args, **kwargs)) 35 return get_from_cache(self, cache_name) 36 ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in _cholesky(self) 394 """ 395 from .non_lazy_tensor import NonLazyTensor --> 396 evaluated_mat = self.evaluate() 397 398 # NOTE: this hack is in place so that the gradient of the Cholesky factorization is symmetric ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs) 32 cache_name = name if name is not None else method 33 if not is_in_cache(self, cache_name): ---> 34 add_to_cache(self, cache_name, method(self, *args, **kwargs)) 35 return get_from_cache(self, cache_name) 36 ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/sum_lazy_tensor.py in evaluate(self) 60 @cached 61 def evaluate(self): ---> 62 return sum(lazy_tensor.evaluate() for lazy_tensor in self.lazy_tensors) 63 64 def __add__(self, other): ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/sum_lazy_tensor.py in (.0) 60 @cached 61 def evaluate(self): ---> 62 return sum(lazy_tensor.evaluate() for lazy_tensor in self.lazy_tensors) 63 64 def __add__(self, other): ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs) 32 cache_name = name if name is not None else method 33 if not is_in_cache(self, cache_name): ---> 34 add_to_cache(self, cache_name, method(self, *args, **kwargs)) 35 return get_from_cache(self, cache_name) 36 ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py in evaluate(self) 224 @cached 225 def evaluate(self): --> 226 return self.evaluate_kernel().evaluate() 227 228 def ndimension(self): ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs) 32 cache_name = name if name is not None else method 33 if not is_in_cache(self, cache_name): ---> 34 add_to_cache(self, cache_name, method(self, *args, **kwargs)) 35 return get_from_cache(self, cache_name) 36 ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py in evaluate_kernel(self) 216 self.kernel.active_dims = None 217 res = self.kernel( --> 218 x1, x2, diag=False, batch_dims=self.batch_dims, **self.params 219 ) 220 self.kernel.active_dims = temp_active_dims ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/kernels/kernel.py in __call__(self, x1, x2, diag, batch_dims, **params) 498 res = LazyEvaluatedKernelTensor(x1_, x2_, kernel=self, batch_dims=batch_dims, **params) 499 else: --> 500 res = super(Kernel, self).__call__(x1_, x2_, batch_dims=batch_dims, **params) 501 502 # Now we'll make sure that the shape we're getting makes sense ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/module.py in __call__(self, *inputs, **kwargs) 20 21 def __call__(self, *inputs, **kwargs): ---> 22 outputs = self.forward(*inputs, **kwargs) 23 if isinstance(outputs, list): 24 return [_validate_module_outputs(output) for output in outputs] in forward(self, x1, x2, batch_dims, **params) 69 70 #print(effects.size()) ---> 71 effects = effects.mul(self.user_mat) 72 73 final = effects + self.first_mat ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in mul(self, other) 1129 return self._mul_constant(other.view(*other.shape[:-2])) 1130 -> 1131 return self._mul_matrix(other) 1132 1133 def ndimension(self): ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in _mul_matrix(self, other) 485 486 self = self.evaluate_kernel() --> 487 other = other.evaluate_kernel() 488 if isinstance(self, NonLazyTensor) or isinstance(other, NonLazyTensor): 489 return NonLazyTensor(self.evaluate() * other.evaluate()) AttributeError: 'Tensor' object has no attribute 'evaluate_kernel'

Sabina321 commented 5 years ago

Hi again,

I've tried also doing MatmulLazyTensor() instead of *, and then I get an error about the Cholesky decomposition. However, this same data and multiplication worked in the earlier version. Thanks for any feedback!

gpleiss commented 5 years ago

Hi @Sabina321 - I think I know what the issue is with *. Can you post the Cholesky decomposition error?

Also, which version is the "earlier version"? Are you using gpytorch of the master branch on Github, or the stable release?

Sabina321 commented 5 years ago

HI @gpleiss. Thanks for the response!

The last version was 2.1.0 but then I updated it when the hard constraints were added. After making the update this is having a problem. The error is saying there is a problem with the matrix, but this same matrix was fine before in version 2.1.0 pre-constraint update. I'm using the master branch on Github.

in

in train(num_iter) 19 optimizer.zero_grad() 20 output = model(newx) ---> 21 loss = -mll(output, newy) 22 loss.backward() 23 #print('Iter %d/%d - Loss: %.3f' % (i + 1, num_iter, loss.item())) ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/module.py in __call__(self, *inputs, **kwargs) 20 21 def __call__(self, *inputs, **kwargs): ---> 22 outputs = self.forward(*inputs, **kwargs) 23 if isinstance(outputs, list): 24 return [_validate_module_outputs(output) for output in outputs] ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py in forward(self, output, target, *params) 26 # Get the log prob of the marginal distribution 27 output = self.likelihood(output, *params) ---> 28 res = output.log_prob(target) 29 30 # Add terms for SGPR / when inducing points are learned ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/distributions/multivariate_normal.py in log_prob(self, value) 127 128 # Get log determininat and first part of quadratic form --> 129 inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True) 130 131 res = -0.5 * sum([inv_quad, logdet, diff.size(-1) * math.log(2 * math.pi)]) ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in inv_quad_logdet(self, inv_quad_rhs, logdet, reduce_inv_quad) 990 from .chol_lazy_tensor import CholLazyTensor 991 --> 992 cholesky = CholLazyTensor(self.cholesky()) 993 return cholesky.inv_quad_logdet(inv_quad_rhs=inv_quad_rhs, logdet=logdet, reduce_inv_quad=reduce_inv_quad) 994 ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in cholesky(self, upper) 716 (LazyTensor) Cholesky factor (lower triangular) 717 """ --> 718 res = self._cholesky() 719 if upper: 720 res = res.transpose(-1, -2) ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs) 32 cache_name = name if name is not None else method 33 if not is_in_cache(self, cache_name): ---> 34 add_to_cache(self, cache_name, method(self, *args, **kwargs)) 35 return get_from_cache(self, cache_name) 36 ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in _cholesky(self) 401 evaluated_mat.register_hook(_ensure_symmetric_grad) 402 --> 403 cholesky = psd_safe_cholesky(evaluated_mat.double()).to(self.dtype) 404 return NonLazyTensor(cholesky) 405 ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/utils/cholesky.py in psd_safe_cholesky(A, upper, out, jitter) 44 continue 45 ---> 46 raise e 47 48 ~/anaconda3/envs/py36/lib/python3.6/site-packages/gpytorch/utils/cholesky.py in psd_safe_cholesky(A, upper, out, jitter) 19 """ 20 try: ---> 21 L = torch.cholesky(A, upper=upper, out=out) 22 # TODO: Remove once fixed in pytorch (#16780) 23 if A.dim() > 2 and A.is_cuda: RuntimeError: Lapack Error in potrf : the leading minor of order 22 is not positive definite at /Users/soumith/mc3build/conda-bld/pytorch_1549593514549/work/aten/src/TH/generic/THTensorLapack.cpp:658

gpleiss commented 5 years ago

If you update the version of GPyTorch (I just pushed a fix), you should be able to use MatmulLazyTensor

Sabina321 commented 5 years ago

Hhmmm, not quite yet. I'm getting the same error. To reinstall I'm doing:

pip install git+https://github.com/cornellius-gp/gpytorch.git --upgrade

Sabina321 commented 5 years ago

Does it check that both matrices are positive-definite? When it calls MatmulLazyTensor(a,b) do both a and b have to be positive-definite? Or is it just the product? One of the matrices won't necessarily be positive definite. But before that wasn't a problem, because what the kernel returns will ultimately be psd. And I've run this example in other packages also and it works.

Sabina321 commented 5 years ago

Hi sorry to both you again but it seems like the mistake might have come from my not understanding what MatmulLazyTensor does. Could you clarify what exactly this function does?

jacobrgardner commented 5 years ago

MatmulLazyTensor represents the matrix product of two LazyTensors. For example AB^{T}. In general this will usually only be positive definite when of the form AA^{T}.

jacobrgardner commented 5 years ago

I think if you are using it instead of that's going to be your issue because usually means an elementwise product, not a matrix product

Sabina321 commented 5 years ago

Thank you. One issue I'm encountering now is with the constraints. It doesn't seem like they are being followed. Sometimes they are violated, is this just something to expect? Thanks!

jacobrgardner commented 5 years ago

What leads you to believe they are being violated? As Geoff mentioned above, raw parameters are distinct from the actual parameter values, and only the actual parameter values will follow the constraint (e.g., likelihood.noise_covar.raw_noise will not satisfy the constraint, but likelihood.noise will)

Sabina321 commented 5 years ago

In some cases I have a positive constaint (e.g. elf.register_constraint("u2",constraint= constraints.Positive())) and the variable u2 becomes negative. But it seems like in all of the cases where the data is right and the parameter values are initialized ok it stays positive.

The hard constraints are working much better than just setting priors and gpytorch is now finding the same values for the parameters as the other packages I've tried. Thank you for all of your help on this long thread!

jacobrgardner commented 5 years ago

Keep in mind that constraints are registered to raw parameters. To get the positive value, use the constraints transform function. Check out RBFKernel and see how we handle the lengthscale / raw lengthscale. You're going to want a u2 / raw u2

Sabina321 commented 5 years ago

Ok, this is what I have done now:

    self.register_parameter(name="u1", parameter=torch.nn.Parameter(1.0*torch.tensor(1.0)))
    self.register_parameter(name="raw_u1", parameter=torch.nn.Parameter(1.0*torch.tensor(1.0)))
    self.register_parameter(name="u2", parameter=torch.nn.Parameter(1.0*torch.tensor(1.0)))
    self.register_parameter(name="raw_u2", parameter=torch.nn.Parameter(1.0*torch.tensor(1.0)))
    self.register_parameter(name="rho", parameter=torch.nn.Parameter(1.0*torch.tensor(1.0)))
    self.register_parameter(name="raw_rho", parameter=torch.nn.Parameter(1.0*torch.tensor(1.0)))
    self.register_constraint("raw_u1",constraint= constraints.Positive())
    self.register_constraint("raw_u2",constraint= constraints.Positive())
    self.register_constraint("raw_rho",constraint= constraints.Interval(0,2))

I think now I am only getting negative values if the optimization has an error. Thanks!

cornellius-gp / gpytorch

kernel design #606