🐛 Bug

My GPyTorch is version 1.4.2, PyTorch is version 1.8.0, and Computer OS of Ubuntu 20.04.

I defined an approximate GP model as:

class GPModel(gpytorch.models.ApproximateGP):
    def __init__(self, inducing_points, use_ard=True):
        variational_distribution = gpytorch.variational.NaturalVariationalDistribution(inducing_points.size(0))
        # variational_strategy = gpytorch.variational.CiqVariationalStrategy(
        #     self, inducing_points, variational_distribution, learn_inducing_locations=True
        # )
        variational_strategy = gpytorch.variational.VariationalStrategy(
            self, inducing_points, variational_distribution, learn_inducing_locations=True
        )
        super(GPModel, self).__init__(variational_strategy)
        ard_num_dims = None
        if use_ard:
            ard_num_dims = inducing_points.shape[-1]
        self.mean_module = gpytorch.means.ZeroMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=ard_num_dims))

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

And I used the following code to save and load the model:

torch.save({
        'model_state_dict': model.state_dict(),
        'likelihood_state_dict': likelihood.state_dict()
    }, save_path)

and

model = GPModel(inducing_points=checkpoint['model_state_dict']['variational_strategy.inducing_points'].double().cuda(), use_ard=True).double().cuda()
model.load_state_dict(checkpoint['model_state_dict'])

Strange thing happened, the performance of the loaded model in the tests of testing is much worse than its performance in the tests during training (tests during testing and training, which means the model was being evaluated in both cases). After a long time of debugging, I found in the forward method, variables like x, mean_x, and covar_x all have the same values in the tests during testing and training, and the model's parameters are checked to be all the same (for example, covar_module.raw_outputscale, covar_module.base_kernel.raw_lengthscale_constraint.upper_bound, variational_strategy.variational_params_initialized and etc.) as well.

However, the returned gpytorch.distributions.MultivariateNormal(mean_x, covar_x) gave different results (different multivariate distributions) which is quite confusing. Specifically, I checked the returned distribution's mean and covariance values which are different between testing and training. I'm wondering what might be the problem?

Besides, during training, I observed a warning: PATH/lib/python3.8/site-packages/gpytorch/distributions/multivariate_normal.py:259: NumericalWarning: Negative variance values detected. This is likely due to numerical instabilities. Rounding negative variances up to 1e-10. Would this lead to the problem of different test results of gpytorch.distributions.MultivariateNormal(mean_x, covar_x) during testing (using the saved model) and training? (Update: I checked this, when there is no warning of negative variances during training, the distribution's values are still different during tests between training and testing. This is so wired problem.)

Look forward to your help! Thank you!

Best regards, Pengzhi

cornellius-gp / gpytorch

[Bug] Different test results during training and testing #2394

🐛 Bug