lucheroni commented 3 years ago

🐛 Bug

After I upgraded from GPyTorch version 1.4.1 to version 1.5.0, I noticed that I get two very different results when I run the same program under the two versions. This program makes use of the exact multitask setup. From it, I can get differences in loss values up to an order 100. Notice that I think that the problem is related only to the exact multitask classes. I made some quick tests with the single task case too, and I got no problem in that case.

It is not easy to prepare a code snippet to reproduce the problem, but I'll try. This is because each time the model is instantiated there are some parameters that are initialized randomly.
Whereas v. 1.4.1 doesn't allow to initialize parameters in the exact multitask case (an error which was eliminated in v. 1.5.0), v. 1.5.0 does.

Consider that before upgrading I took a safety copy of my conda environment. Hence I have available both my original env with v. 1.4.1 and a clone env of it with v. 1.5.0 as the only difference (I checked during the update that 'conda install -c conda-forge gpytorch' didn't change anything else except gpytorch itself).

The code hereafter should be run as follows. I'm having in mind that the test is made under Spyder. Prepare the two environments, identical except for gpytorch versions.

First, choose the appropriate v. 1.4.1 python interpreter from Tools, start a console, and run the code in it. Note down the values of cf and rw - these will be inserted 'by hand' when running the other version. Note down the loss value. Second, switch to the v. 1.5.1 interpreter (both by loading a suitable kernel and starting another console). Run the code. Note down the loss value.

I tried a few times, and I got always different values, of the same order of magnitude.

The example proposed is very small, just two tasks, one latent gp, and one basic one-parameter kernel type. Cf and rw are the only parameters that get random initialization. In my own program the tasks are 24, the latents are 24, the kernels are complicated and computation time is large. This is probably the reason why I get those huge differences. I also noticed that the larger is the numer of time points, the larger is the error. This is why I chose an example with six time points at least.

To reproduce

Code snippet to reproduce

# test for the difference in behavior 
# for the Exact MULTI-TASK case
# between gpytorch ver. 1.4.1 and 1.5.0

import torch
import gpytorch

class MultitaskGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)

        self.mean_module = gpytorch.means.MultitaskMean(
            gpytorch.means.ConstantMean(), num_tasks=num_tasks
        )

        ker = gpytorch.kernels.RBFKernel()

        self.covar_module = gpytorch.kernels.LCMKernel(
            [ker for _ in range(num_latents)],            
            num_tasks=num_tasks, rank=rank            )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)

        return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)

rank = 1
num_tasks = 2
num_latents = 1

# 2 time points
train_x = torch.linspace(0, 1, 6)

# 2 tasks 
train_y = torch.stack([
            torch.tensor([1,2,1,2,1,2]), torch.tensor([2,2,2,2,2,2])
                      ])

torch.set_printoptions(threshold=10_000)

likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=num_tasks)
model = MultitaskGPModel(train_x, train_y, likelihood)

# every parameter is set to zero 
# except covar_factor and raw_var 
#for name, param in model.named_parameters():
#    if param.requires_grad:
#        print(name,param)
#for name, param in likelihood.named_parameters():
#    if param.requires_grad:
#        print(name,param)

version = gpytorch.__version__

if version == '1.4.1': 
    cf_141 = model.covar_module.covar_module_list[0].task_covar_module.covar_factor.detach() 
    rv_141 = model.covar_module.covar_module_list[0].task_covar_module.raw_var.detach()                   
    #
    print('cf = ', cf_141.flatten()) #  tensor([-0.0007,  0.0964])
    print('rv = ', rv_141.flatten()) #  tensor([2.5536, 2.3405])
    print('note down these values')

if version == '1.5.0':
    hypers = {
        'covar_module.covar_module_list.0.task_covar_module.covar_factor': torch.tensor([-1.3655,  2.8571]),
        'covar_module.covar_module_list.0.task_covar_module.raw_var': torch.tensor([ 0.0093, -1.8867])
             }
    # works only with 1.5.0
    model.initialize(**hypers)
    #
    print('parameters reinitialized')

# check
#for name, param in model.named_parameters():
#    if param.requires_grad:
#        print(name,param)  

mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

# in case, check all values
if False:
# all parameters:
# all zero
    mll.likelihood.raw_task_noises
    mll.likelihood.raw_noise
# all zero
    mll.model.likelihood.raw_task_noises
    mll.model.likelihood.raw_noise
# all zero
    mll.model.mean_module.base_means[0].constant
    mll.model.mean_module.base_means[1].constant
# these are not zero
    mll.model.covar_module.covar_module_list[0].task_covar_module.covar_factor
    mll.model.covar_module.covar_module_list[0].task_covar_module.raw_var
# this is zero
    mll.model.covar_module.covar_module_list[0].data_covar_module.raw_lengthscale

output = model(train_x)
loss = -mll(output, train_y) 
#
print(gpytorch.__version__) # 1.4.1 , 1.5.0
print(loss.item())  # 3.029841184616089 # 1.742202639579773         

if version == '1.4.1': 
    print('note down this value')

# run 1.4.1 first, get cf_141 and rv_141,
# note down the loss
# then insert them into code, 
# then run 1.5.0
# and note down the loss.
# Compare.

Stack trace/error message

// Paste the bad output here!

Expected Behavior

No difference between the two versions - identical output.

System information

Please complete the following information:

GPyTorch versions 1.4.1 and 1.5.0
Pytorch 1.8.1
OS Win10 Pro

Additional context

Add any other context about the problem here.

Balandat commented 3 years ago

I don't think this is a bug, I think this was a bug in 1.4.1 that was fixed in 1.5.0 by #1647.

lucheroni commented 3 years ago

Thus you think that all exact-multitask results I got using 1.4.1 are wrong?

Carlo

On Thu, Jul 8, 2021 at 4:54 PM Max Balandat @.***> wrote:

I don't think this is a bug, I think this was a bug in 1.4.1 that was fixed in 1.5.0 by #1647 https://github.com/cornellius-gp/gpytorch/pull/1647.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cornellius-gp/gpytorch/issues/1689#issuecomment-876507960, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOMFUN7FEHZJB32BE7HTVNDTWW323ANCNFSM5AAYRYQA .

-- Carlo Lucheroni School of Sciences and Technologies Universita' di Camerino via Madonna delle Carceri 9 62032 Camerino (MC) Italy Office phone: 39-0737402552

-- Ask me for my Mathematical Finance and Stochastic Dynamic Optimization courseware: 'Take the Chance!'

View and download my papers on Finance and Physics from ResearchGate: https://www.researchgate.net/profile/Carlo_Lucheroni/publications

View and download my research on power markets on my SSRN Author page: http://ssrn.com/author=1390778

You can also check IDEAS/RePEc for my name: https://ideas.repec.org/f/plu234.html https://ideas.repec.org/

Balandat commented 3 years ago

They're not wrong - it's just that the loss was improperly normalized - the scale of the loss is different now, but for the same parameterization the location of the optima is the same. The proper scaling may help resolve numerical issues / difficulties the optimizer may have.

lucheroni commented 3 years ago

Ok, thanks. I did notice that restarts now pop up with a different pattern.

A question - without passing through github, I don't know whether this is worthwhile.

Some time ago I wanted to check the quality of the multitask variational approximation in comparison to the exact approach. I wrote the code, then discovered that the variational LCMKernel doesn't behave the same as the exact one. I mean, I suspect that there are more parameters in the exact case (twice) than in the variational case, because of how the coregionalization matrices are defined. The exact case does include the diag matrix, the variational doesn't. I'm I wrong?

I tried to kill the diag by the 1.5.0. model.initialize() feature, but it doesn't work, I had instability problems.

My question is, do you plan to let users have direct access to the shape of the coregionalization matrices (letting them to include or not the diag) at the MultitaskGPModel class level?

Such a feature plus model.initialize() would also allow for checking directly the independent tasks limit in the exact case. One could zero the diag and set to unity the kron part, thus getting a multitask independent model directly inside the MultitaskGPModel class - and turn it on and off.

Regards,

Carlo

C

On Thu, Jul 8, 2021 at 5:43 PM Max Balandat @.***> wrote:

They're not wrong - it's just that the loss was improperly normalized - the scale of the loss is different now, but for the same parameterization the location of the optima is the same. The proper scaling may help resolve numerical issues / difficulties the optimizer may have.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cornellius-gp/gpytorch/issues/1689#issuecomment-876545402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOMFUN3Z46F3N54JYX6QCCLTWXBQFANCNFSM5AAYRYQA .

-- Carlo Lucheroni School of Sciences and Technologies Universita' di Camerino via Madonna delle Carceri 9 62032 Camerino (MC) Italy Office phone: 39-0737402552

-- Ask me for my Mathematical Finance and Stochastic Dynamic Optimization courseware: 'Take the Chance!'

View and download my papers on Finance and Physics from ResearchGate: https://www.researchgate.net/profile/Carlo_Lucheroni/publications

View and download my research on power markets on my SSRN Author page: http://ssrn.com/author=1390778

You can also check IDEAS/RePEc for my name: https://ideas.repec.org/f/plu234.html https://ideas.repec.org/

lucheroni commented 3 years ago

I have checked my previous exact-multitask results obtained with 1.4.1 against what I'm getting with 1.5.0.

I fear that https://github.com/cornellius-gp/gpytorch/pull/1647 doesn't explain the difference, there is more to it. They are too much different. The gp minima don't match at all.

I've though that a possible way to pin the problem down could be to use the exact multitask gp setup with one task only, and to compare it with the exact gp (the univariate one) setup. I tried, and the two results don't match.

Since this could be possibly an issue specific to version 1.5.0, I'm going to open another GitHub issue. I hope the discussion will help other users - and take myself out of this.

gpleiss commented 3 years ago

I have checked my previous exact-multitask results obtained with 1.4.1 against what I'm getting with 1.5.0.I fear that #1647 doesn't explain the difference, there is more to it. They are too much different. The gp minima don't match at all.

Please post a reproducible code example.

gpleiss commented 3 years ago

I wrote the code, then discovered that the variational LCMKernel doesn't behave the same as the exact one.

This is true. There's probably a larger discussion to have about reconciling variational and exact multitask models.

cornellius-gp / gpytorch

[Bug] Possible inconsistency in exact multitask modeling results between GPyTorch versions 1.4.1 and 1.5.0 #1689

🐛 Bug

To reproduce

Expected Behavior

System information

Additional context