cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.53k stars 555 forks source link

[Docs] How to troubleshoot inaccurate GP fit #1334

Open ru111 opened 3 years ago

ru111 commented 3 years ago

Hi, sorry if I'm posting in the wrong category. I was wondering how to diagnose my model when the GP gets the fit wildly wrong (identifying when it goes wrong and trying most likely fixes) besides visualising datapoints and the mean fit & confidence interval in different dimensions. For example here I'm using SGPR with 2 input features (ARD with lengthscale constraints to be above a certain threshold for each) and let it train until loss approximately plateaus, but when I visualise the evaluation output on training data the mean function seems to completely miss the data in some dimensions (plotting a few different "slices" in the second input dimension).

Screenshot 2020-11-08 at 14 32 53

In some runs the fit looks decent, so it seems like it only happens sometimes. But I don't have intuition of what might be causing it - is it because of lengthscale constraints, or is the learning rate/iteration problem, or is my data too noisy? Should I set a prior somehow and would that help?

Besides looking for an answer to this specific problem, in general I thought it might be useful to have a troubleshooting guide (although I'm not sure if there is much demand as I assume most users actually know what to look for or it just depends a lot on the model specification etc.) I realised it would be helpful to plot residuals and find the percentage of datapoints that fall into the confidence region and use that as a sanity check. But besides that I'm not too sure what to look for (e.g. helpful model outputs to check what goes wrong?) and what to try out first to fix it. Thanks!

jacobrgardner commented 3 years ago

@ru111 Do you actually get better test error / NLL when you see good fits in all dimensions?

I'm a little confused by the plots above. Are these from the same GP and you are plotting as a function of a single input feature? How were these generated?

ru111 commented 3 years ago

@jacobrgardner Apologies for the confusion, red and blue points are separate datasets and I train a separate model for each (they both have the same features & model specs). The lines are the respective function means from the trained model. x axis is the first input dimension and I just picked 3 values from the second input dimensions and plotted them separately.

With regards to loss, they always seem to start at around 0.9 and plateau at 0.5 ish after around 15 iterations at lr=0.1. I should have mentioned this earlier but I was getting warnings like:

/miniconda3/envs/gpytorch/lib/python3.6/site-packages/gpytorch/utils/cholesky.py:46: NumericalWarning: A not p.d., added jitter of 1.0e-05 to the diagonal
  warnings.warn(f"A not p.d., added jitter of {jitter_new:.1e} to the diagonal", NumericalWarning)
/miniconda3/envs/gpytorch/lib/python3.6/site-packages/gpytorch/utils/linear_cg.py:317: NumericalWarning: CG terminated in 2000 iterations with average residual norm 1.7044315338134766 which is larger than the tolerance of 1 specified by gpytorch.settings.cg_tolerance. If performance is affected, consider raising the maximum number of CG iterations by running code in a gpytorch.settings.max_cg_iterations(value) context.
  NumericalWarning,
/miniconda3/envs/gpytorch/lib/python3.6/site-packages/gpytorch/utils/linear_cg.py:317: NumericalWarning: CG terminated in 2000 iterations with average residual norm 1.487385630607605 which is larger than the tolerance of 1 specified by gpytorch.settings.cg_tolerance. If performance is affected, consider raising the maximum number of CG iterations by running code in a gpytorch.settings.max_cg_iterations(value) context.
  NumericalWarning,
/miniconda3/envs/gpytorch/lib/python3.6/site-packages/gpytorch/utils/linear_cg.py:317: NumericalWarning: CG terminated in 2000 iterations with average residual norm 1.4647430181503296 which is larger than the tolerance of 1 specified by gpytorch.settings.cg_tolerance. If performance is affected, consider raising the maximum number of CG iterations by running code in a gpytorch.settings.max_cg_iterations(value) context.
  NumericalWarning,
/miniconda3/envs/gpytorch/lib/python3.6/site-packages/gpytorch/utils/linear_cg.py:317: NumericalWarning: CG terminated in 2000 iterations with average residual norm 4.047982692718506 which is larger than the tolerance of 0.01 specified by gpytorch.settings.cg_tolerance. If performance is affected, consider raising the maximum number of CG iterations by running code in a gpytorch.settings.max_cg_iterations(value) context.
  NumericalWarning,
/miniconda3/envs/gpytorch/lib/python3.6/site-packages/ipykernel_launcher.py:112: RuntimeWarning: divide by zero encountered in double_scalars

I used get warnings but end up with decent fit sometimes (but this might have been when I had a very old version of gpytorch which I recently upgraded). I ran lr=0.01 with n_iter=100 which was also unstable (also getting the CG warning - I did double the max_cg_iterations following the warning, although I wasn't sure what to make of the documentation's conflicting advice "A higher value rarely results in more accurate solves – instead, lower the CG tolerance."). I have no idea what these warnings imply so I was kind of just trying a bunch of things. I saw #1129 but my data is all z normalised. A curious observation is that the training data that the GP wildly misses seems to be around large negative values in one of the input dimensions.

gpleiss commented 3 years ago

@ru111 - it is often helpful to normalize your x values - i.e. scale them between -1 and 1. This might help some of the numerical instabilities you are seeing.

Would you be able to post an example to reproduce these results?

ru111 commented 3 years ago

Sorry for the late reply, the problem seemed to go away when I used a smaller learning rate and optimised n_iter (the model started diverging from the data quite badly when the iteration kept going after it found the optima - what I didn't know was the fact that n_iter and learning rate need to be optimised in a similar way to neural network optimisation using a validation set? (Somehow I thought this wouldn't matter too much with GPs). I switched to a GPU to speed up my experiments although I'm not sure if that has anything to do with it.