cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.58k stars 562 forks source link

[Bug] no learning of covar_factor in multitask GP #924

Open fonnesbeck opened 5 years ago

fonnesbeck commented 5 years ago

I'm using (and have used successfully in the past) a Multitask GP in GPyTorch to model sets of 2-d data. However, in many instances, I notice that the covariance factor array does not move away from the initial values of one:

{'noise_covar.raw_noise': array([0.01422784], dtype=float32),
 'raw_lengthscale': array([[5.9826803, 2.3036797]], dtype=float32),
 'covar_factor': array([[1.],
        [1.],
        [1.],
        [1.],
        [1.]], dtype=float32),
 'raw_var': array([0.01249569, 0.07380817, 0.02100662, 0.01718231, 0.02019699],
       dtype=float32)}

The model itself is pretty straightforward, as I used one of your examples as a template:

class MultitaskGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, n_batters, likelihood):
        super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ZeroMean()
        self.covar_module = gpytorch.kernels.MaternKernel(nu=1.5, ard_num_dims=2,
                                          lengthscale_prior=gpytorch.priors.GammaPrior(3, 3))

        self.task_covar_module = gpytorch.kernels.IndexKernel(num_tasks=n_batters, rank=1)

    def forward(self, x, i):
        mean_x = self.mean_module(x)

        # Get input-input covariance
        covar_x = self.covar_module(x)
        # Get task-task covariance
        covar_i = self.task_covar_module(i)

        # Multiply the two together to get the covariance we want
        covar = covar_x.mul(covar_i)

        return gpytorch.distributions.MultivariateNormal(mean_x, covar)

likelihood = gpytorch.likelihoods.GaussianLikelihood().to(device)

Running this on a GPU using the current release on conda.

Balandat commented 5 years ago

Do you have some minimal reproable example with simple data? I can run the Hadamard MTGP example notebook and don't observe this behavior.

Does this also happen when running on the CPU?

Balandat commented 5 years ago

Aside: I introduced an improved parameterization of correlation matrices in #912, porting this over to to the index kernel probably makes sense.

fonnesbeck commented 5 years ago

Attached is an example, as requested, that replicates the behavior for me:

{'noise_covar.raw_noise': array([0.01730466], dtype=float32), 'raw_lengthscale': array([[10.005662,  6.14979 ]], dtype=float32), 'covar_factor': array([[1.],
       [1.],
       [1.],
       [1.],
       [1.]], dtype=float32), 'raw_var': array([0.00579692, 0.07211152, 0.03042104, 0.00402563, 0.0030037 ],
      dtype=float32)}

I have not tried on the CPU, but will do so now.

[EDIT] Happens on the CPU as well.

Archive.zip