cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.53k stars 555 forks source link

[Bug] Negative variances obtained for reasons I do not understand #1840

Open ishank-juneja opened 2 years ago

ishank-juneja commented 2 years ago

🐛 Bug

From time to time I observe negative variances from my trained gpytorch models when I run,

prediction_distribution = likelihood(model)

I don't think it would serve anyone to share a specific code snippet for this behavior since as I have mentioned in a previous comment on another negative variance issue #864 this behavior is not even reproducible with identical python, gpytorch and torch versions so I might share a snippet here and it might give reasonable non-negative variances when someone else tests it.

What I am having trouble understanding is that if the covariance matrix of the posterior distribution output by a GP has the following form at test-time (image source, Page 3 of these class notes), image

then how do the variance values I see on running,

with torch.no_grad():
    # Inference on an independent model list type model
    prediction_dist = likelihood(*model(ux, ux))

print('vars: ', prediction_dist1[0].covariance_matrix.detach().numpy(), prediction_dist1[1].covariance_matrix.detach().numpy()) 

ever become negative?

Expected Behavior

I expect to not see negative variances no matter how I train and test my GP Model. In fact I would have expected that the diagonal of the covariance matrices of gpytorch.distributions not be allowed any negative entries at all.

ishank-juneja commented 2 years ago

I believe this has to do with numerical precision issues when noise in data is small (learned likelihood noise is small).\

Specifically the result of this line seems to be negative at times for numerical precision related reasons.

Will wait for a maintainer to get back saying that this does not count as a bug before closing issue.

wjmaddox commented 2 years ago

Ugh, sorry we missed this. Is there a reproducible example for this? I think it might have been in your other issues?

ishank-juneja commented 2 years ago

The version of code where I saw this is from a project and some custom dataset (generation involves randomness).

Although randomness can be seeded it is still multiple .py files so it is hard for me to compress it all into a standalone .py file.

Why I think this line is responsible is because I stepped through my and gpytorch's code for the case where I was seeing a negative variance and got to the point in that line where I confirmed that the magnitude of test_test_covar was less than the magnitude of MatmulLazyTensor(test_train_covar, covar_correction_rhs.mul(-1)) meaning that the returned value was going to be negative.

If you would really like to see a reproducible example I can spend some time coming up with one, please let me know.

gpleiss commented 2 years ago

@ishank-juneja can you try re-running your example in double precision? Call model.double(), likelihood.double(), train_x = train_x.double(), train_y = train_y.double(), test_x = test_x.double(), etc...