cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.46k stars 547 forks source link

Exact GP outputs show poor correlation with targets (even on training data) #788

Closed Akella17 closed 4 years ago

Akella17 commented 4 years ago

I have implemented a custom BiLinear kernel, on the same lines as gpytorch.kernels.LinearKernel(). Using this kernel, I fit a GP on the training data and then perform exact inference on the same training inputs. The outputs of the GP regression show a correlation of 0.3648 with the targets. For my purpose, I need to get this number to be much higher.

1) I tried reducing the likelihood noise but soon ran into numerical issues with exploding residuals and warnings to increase the CG iterations. 2) While I observed that reducing the training data improves the correlation, I want to know if there is any other way that robustly deals with this problem.

The ipython notebook can be found in the following link. Here is a link to the entire folder, i.e. data + notebook file.

Under the hood, I am trying to solve a reinforcement learning problem where the training inputs are the gradients of a parameterized policy network, training targets are the values of state-action pairs (corresponding actions taken by the policy network in a particular state) and the kernel matrix is the inverse of empirical Fisher information matrix. Effectively, I am trying to model the Fisher kernel as a BiLinear kernel.

Akella17 commented 4 years ago

@jacobrgardner Using torch.set_default_tensor_type(torch.DoubleTensor) also does not lead to any significant improvements.

jacobrgardner commented 4 years ago

Using double tensors would only help if your problem was specifically a numerical issue, when it could just as well be a modelling or training issue in this case. @Akella17 Do you have evidence to believe this isn't just a modelling issue? E.g., are you certain that the kernel you've coded should work on this data?

Akella17 commented 4 years ago

@jacobrgardner I too suspected that it could be a modeling issue but shouldn't evaluating on the training data give high correlation wrt the corresponding training targets? Since the correlation is on average not greater than 0.4, I suspect some kind of numerical issue under the hood.

Using float previously led to exploding residuals and warning message to increase CG iterations. Changing to double gave results, although the correlation is pretty low (<0.4).

Akella17 commented 4 years ago

@jacobrgardner Removing the input scaling (i.e. gpytorch.utils.grid.scale_to_bounds(x, -1, 1)) led to an approximate 0.1 jump in correlation. However, the correlation is still too weak for my purpose. I tried increasing many possible settings but none of them led to any significant improvement.

with gpytorch.settings.max_cg_iterations(2000),\
    gpytorch.settings.max_lanczos_quadrature_iterations(32),\
    gpytorch.settings.fast_computations(covar_root_decomposition=False, log_prob=False, solves=True), \
    gpytorch.settings.max_preconditioner_size(20),\
    gpytorch.settings.num_trace_samples(128):

I am pretty sure that this is a numerical issue as I am computing correlation on the training data itself, i.e. I am querying a GP fitted on training data with training inputs and compute the correlation of outputs w.r.t training outputs. In theory, if the likelihood.noise is 0, the outputs of GP will analytically be the same as the training outputs for any choice of covariance function, giving a correlation of 1. However, the correlation in this case does not exceed 0.55.

Akella17 commented 4 years ago

@jacobrgardner Hey! The following settings has allowed me to reach an average correlation of 0.75.

model.initialize(**{'likelihood.noise': 0.01})
with gpytorch.settings.max_preconditioner_size(50), \
gpytorch.settings.fast_computations(covar_root_decomposition=False, log_prob=False, solves=True):

However, I still keep getting the increasing the max_CG_iterations warning in almost all the GP evaluation calls. I am guessing that we can attain >0.9 correlation by increasing the computational requirements and allow CG iterations to converge. How can I get this done?

KeAWang commented 4 years ago

with gpytorch.settings.max_cg_iterations(2000):

By default gpytorch uses at most 1000 cg iterations.

Akella17 commented 4 years ago

@KeAWang Agreed. But I have tried several settings of max_cg_iterations in the range of [1000, 10000] and I did not observe any consistent pattern of improvement or otherwise.

jacobrgardner commented 4 years ago

@Akella17 A few things. First, I'm not getting CG warnings when I run your code, but am getting a correlation of ~0.75.

Most importantly, however, it is not at all true that any kernel will give you a correlation of one / a perfect fit with zero noise. An obvious requirement for this is that your kernel must be full rank (i.e., truly positive definite rather than positive semidefnite), since you need an exact solve K^{-1}y.

Linear and polynomial kernels, for example, can give rise to low rank kernel matrices, in which case the expression KK^{-1}y that you are expecting to be y with 0 noise doesn't actually make sense, because K^{-1}y is undefined without the added noise (although you could imagine solving a least squares problem, which would again be approximate).

Your kernel is essentially an extension of the linear kernel Looking at the eigenvalues of your kernel matrix, it is clearly not full rank enough to give you exact solves:

(covar_x.evaluate().symeig()[0] < 1e-5).sum()
tensor(310, device='cuda:0')
(covar_x.evaluate().symeig()[0] < 1e-7).sum()
tensor(46, device='cuda:0')

In fact, with a matrix like this Cholesky will give you garbage solves even if it technically runs:

solve = torch.cholesky_solve(train_y, torch.cholesky(covar_x.evaluate()))
torch.norm(covar_x.matmul(solve) - train_y)
tensor(23038.8296, device='cuda:0')

Given residuals this large, CG will of course fail from time to time since it is not actually possible to get a lower residual norm. Unless you have specific reason to believe there is something else wrong, I would say things are working fine, and you are just getting an inexact fit as a modelling issue.

Akella17 commented 4 years ago

@jacobrgardner Oh okay, got it! I totally forgot the possibility of dealing with low-rank kernel matrices and as a result, was under the wrong impression. Thanks for going into the details to clarify this for me.

However, I am still not sure how you are getting ~0.75 correlation while the same code gives 0.42833 on my system as well as on Google colabs. Did you change any gpytorch.settings/likelihood.noise to get 0.75?