cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.54k stars 557 forks source link

CG Tolerance warnings with KISS-GP and SpectralMixtureKernel #2077

Open pscicluna opened 2 years ago

pscicluna commented 2 years ago

I'm encountering issues with CG Tolerance when using KISS-GP on a SpectralMixtureKernel to interpret (quasi-)periodic time variability in astronomical data. This warning only appears for some values of the mixture parameters. I've read both issues https://github.com/cornellius-gp/gpytorch/issues/1129 and https://github.com/cornellius-gp/gpytorch/issues/2045 but nothing immediately obvious jumps out at me as related. As a result, it's not clear to me if SKI is actually gaining me anything in terms of performance.

My data tend to have relatively long periods/low frequencies, and the errors only occur when getting into that domain. If I don't initialise the hyperparameters, the performance is very high, but of course the results are useless because it's then in completely the wrong ballpark.

I attempted to follow the suggestion of increasing the number of iterations, but this of course slows computations down dramatically, and even 10^4 iterations are not always sufficient. I can change the units on the timestamps so the expected frequencies are >1, but that requires me to know the expected values ahead of time and doesn't really solve the problem (I still get warnings sometimes).

The only workaround I have found is to scale the timestamps to the [0,1) range. shifting everything to high frequencies. However, then setting the initial values of the hyperparameters to the right ballpark is slightly irritiating, and it seems like the optimisation is highly dependent on this. Any other suggestions you could provide would be most appreciated!

gpleiss commented 2 years ago

My data tend to have relatively long periods/low frequencies, and the errors only occur when getting into that domain. If I don't initialise the hyperparameters, the performance is very high, but of course the results are useless because it's then in completely the wrong ballpark.

Long periods/low frequencies is the regime where CG-based inference (what's used for KISS-GP) is most challenged. This is because the resulting kernel matrices are ill conditioned.

I attempted to follow the suggestion of increasing the number of iterations, but this of course slows computations down dramatically, and even 10^4 iterations are not always sufficient. I can change the units on the timestamps so the expected frequencies are >1, but that requires me to know the expected values ahead of time and doesn't really solve the problem (I still get warnings sometimes).

Try playing around with the preconditioner as well? with gpytorch.settings.max_preconditioner_size(k):. Larger values of k should in theory be better, but in some settings our default preconditioner may actually be hurting convergence. So try both larger and smaller values.

The only workaround I have found is to scale the timestamps to the [0,1) range. shifting everything to high frequencies. However, then setting the initial values of the hyperparameters to the right ballpark is slightly irritiating, and it seems like the optimisation is highly dependent on this. Any other suggestions you could provide would be most appreciated!

In general, we highly recommend z-scoring your inputs and outputs before training a model - this generally makes everything numerically nicer. Yes, it is annoying to z-score and then un-z-score, but this is common to most GP (and ML) models.

Finally, as a future note, it would be nice for the development team if you would open this as a Discussion rather than an issue