cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.54k stars 557 forks source link

[Bug] Different test results during training and testing #2394

Closed pengzhi1998 closed 1 year ago

pengzhi1998 commented 1 year ago

🐛 Bug

My GPyTorch is version 1.4.2, PyTorch is version 1.8.0, and Computer OS of Ubuntu 20.04.

I defined an approximate GP model as:

class GPModel(gpytorch.models.ApproximateGP):
    def __init__(self, inducing_points, use_ard=True):
        variational_distribution = gpytorch.variational.NaturalVariationalDistribution(inducing_points.size(0))
        # variational_strategy = gpytorch.variational.CiqVariationalStrategy(
        #     self, inducing_points, variational_distribution, learn_inducing_locations=True
        # )
        variational_strategy = gpytorch.variational.VariationalStrategy(
            self, inducing_points, variational_distribution, learn_inducing_locations=True
        )
        super(GPModel, self).__init__(variational_strategy)
        ard_num_dims = None
        if use_ard:
            ard_num_dims = inducing_points.shape[-1]
        self.mean_module = gpytorch.means.ZeroMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel(ard_num_dims=ard_num_dims))

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

And I used the following code to save and load the model:

torch.save({
        'model_state_dict': model.state_dict(),
        'likelihood_state_dict': likelihood.state_dict()
    }, save_path)

and

model = GPModel(inducing_points=checkpoint['model_state_dict']['variational_strategy.inducing_points'].double().cuda(), use_ard=True).double().cuda()
model.load_state_dict(checkpoint['model_state_dict'])

Strange thing happened, the performance of the loaded model in the tests of testing is much worse than its performance in the tests during training (tests during testing and training, which means the model was being evaluated in both cases). After a long time of debugging, I found in the forward method, variables like x, mean_x, and covar_x all have the same values in the tests during testing and training, and the model's parameters are checked to be all the same (for example, covar_module.raw_outputscale, covar_module.base_kernel.raw_lengthscale_constraint.upper_bound, variational_strategy.variational_params_initialized and etc.) as well.

However, the returned gpytorch.distributions.MultivariateNormal(mean_x, covar_x) gave different results (different multivariate distributions) which is quite confusing. Specifically, I checked the returned distribution's mean and covariance values which are different between testing and training. I'm wondering what might be the problem?

Besides, during training, I observed a warning: PATH/lib/python3.8/site-packages/gpytorch/distributions/multivariate_normal.py:259: NumericalWarning: Negative variance values detected. This is likely due to numerical instabilities. Rounding negative variances up to 1e-10. Would this lead to the problem of different test results of gpytorch.distributions.MultivariateNormal(mean_x, covar_x) during testing (using the saved model) and training? (Update: I checked this, when there is no warning of negative variances during training, the distribution's values are still different during tests between training and testing. This is so wired problem.)

Look forward to your help! Thank you!

Best regards, Pengzhi

pengzhi1998 commented 1 year ago

Sorry for the negligence, here is the solution to the issue: https://github.com/cornellius-gp/gpytorch/issues/1308.