cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.57k stars 560 forks source link

Latent function(f) #1277

Open irum opened 4 years ago

irum commented 4 years ago

I am new to GP . I am looking for the function which returns the mean and variance of the latent function (f) at the points Xnew.

It must be analogous to Predict _y Of GPflow library. Link

I am using this [Example] (https://docs.gpytorch.ai/en/v1.2.0/examples/06_PyTorch_NN_Integration_DKL/Deep_Kernel_Learning_DenseNet_CIFAR_Tutorial.html)

The shape of the latent function should be (1,10) if there are 10 classes.

Balandat commented 4 years ago

You get that just calling the model on the test data in eval mode:

model.eval()
model(test_inputs)

The outcome is a MultivariateNormal distribution that gives you means and (co-)variance.

irum commented 4 years ago

@Balandat sorry, its my mistake it should be analogous to predict_f not predict_y. please can you inform me which gives me the same output like predict_f of gpflow library.

KeAWang commented 4 years ago
model.eval()
model(test_inputs)

will be the predictive distribution for the latent f.

If you want to predict y, you would do

model.eval()
likelihood(model(test_inputs))
irum commented 4 years ago

Thank you for the reply but the shape of model(test_inputs) is (1,4096) where 4096 are the features coming form model and i want shape(1,10) as i have 10 classes to be predicted. Then how to get mean and variance for 10 classes.

KeAWang commented 4 years ago

Oh oops, I didn't see you were trying to do classification.

If you want the predictive uncertainty for classification, then you want to predict y. Following the tutorial you linked, you just do

with torch.no_grad(), gpytorch.settings.num_likelihood_samples(16):
    model.eval()
    y_samples = likelihood(model(test_inputs))

This will give you a Multivariate Normal that's 16 x N x 10 corresponding to 16 samples drawn from the predictive distribution if you pass in N test_input points with 10 total classes. To get the average softmax probability of each input and each class, you just average over the first dimension

y_samples.probs.mean(0)

which will give you a N x 10 tensor.

irum commented 4 years ago

I am doing classification but I want to use a function like used in the link Please can you check predict_f of gpflow library. I would like to use similar function which returns the mean and variance of the latent function (f) at the points Xnew for 10 classes

KeAWang commented 4 years ago

The link you attached is for doing regression.

The latent function for classification does not in general have the same dimensionality as the number of classes. That is why

model(test_inputs)

is (1,4096) since those features are your latent f for classification.

irum commented 4 years ago

Actually I need 10 values of mean of the latent function f rather than one high predictive probability which we get as an output of y_samples.probs.mean(0) . The link I have mentioned gives (1,10) values for the latent function.

irum commented 4 years ago

@KeAWang The following is the function used in GPflow library for predict_f. Please can you inform me which is exact function in GPytorch please. I am thankful for response but I am still very confused.

      The posterior variance of F is given by
        q(f) = N(f | K alpha + mean, [K^-1 + diag(lambda**2)]^-1)
       Here we project this to F*, the values of the GP at Xnew which is given
      by
          q(F*) = N ( F* | K_{*F} alpha + mean, K_{**} - K_{*f}[K_{ff} +
                                       diag(lambda**-2)]^-1 K_{f*} )

   def predict_f(
    self, Xnew: InputData, full_cov: bool = False, full_output_cov: bool = False
        ) -> MeanAndVariance:
    Note: This model currently does not allow full output covariances
    """
    if full_output_cov:
        raise NotImplementedError

    X_data, _ = self.data
    # compute kernel things
    Kx = self.kernel(X_data, Xnew)
    K = self.kernel(X_data)

    # predictive mean
    f_mean = tf.linalg.matmul(Kx, self.q_alpha, transpose_a=True) + self.mean_function(Xnew)

    # predictive var
    A = K + tf.linalg.diag(tf.transpose(1.0 / tf.square(self.q_lambda)))
    L = tf.linalg.cholesky(A)
    Kx_tiled = tf.tile(Kx[None, ...], [self.num_latent_gps, 1, 1])
    LiKx = tf.linalg.triangular_solve(L, Kx_tiled)
    if full_cov:
        f_var = self.kernel(Xnew) - tf.linalg.matmul(LiKx, LiKx, transpose_a=True)
    else:
        f_var = self.kernel(Xnew, full_cov=False) - tf.reduce_sum(tf.square(LiKx), axis=1)
    return f_mean, tf.transpose(f_var)

`

jacobrgardner commented 4 years ago
model.eval()
model(test_inputs)

will be the predictive distribution for the latent f.

If you want to predict y, you would do

model.eval()
likelihood(model(test_inputs))

@irum Please see @KeAWang's response above. Calling model(test_inputs) is equivalent to what you want, and is not returning a 1 x 4096 tensor but rather a 1 x 4096 MultivariateNormal that represents a distribution over 4096 latent functions.

This does not have the same number of outputs as you have classes unless you change your model for this to be the case -- the likelihood mixes an arbitrary set of latent f function values with a weight matrix W that is num_classes x num_latents -- e.g., 10 x 4096. There is no particular reason the number of latent GPs should be constrained to be equal to the number of classes, so it isn't.

If you want to draw samples of the mixed latent features, you'll need to multiply samples from this distribution by likelihood.mixing_weights -- e.g. something like preds.rsample() @ likelihood.mixing_weights.t() # will be num_classes x num_data, where preds = model(test_inputs). See https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/likelihoods/softmax_likelihood.py for where these mixing weights are defined.

irum commented 4 years ago

Thank you for the reply. I have changed the above mentioned file and usingmixed_fs for this but its values are very large while it should be between 0 and 1. It gives me big values while doing inference.

     if self.mixing_weights is not None:
         mixed_fs = function_samples @ self.mixing_weights.t()                                                                                      
                                                    (num_classes x num_data)

    else:
        mixed_fs = function_samples

    res = base_distributions.Categorical(logits=mixed_fs)
    res.mixed_fs = mixed_fs                       I have added that and using this value
    return res
jacobrgardner commented 4 years ago

Why should the values of the logits be between 0 and 1 before they are passed through a softmax?

irum commented 4 years ago

@jacobrgardner I am getting small values in range 0 and one for mixed_fs during training phase but when I do inference, I am getting large values. I don’t understand why