Open irum opened 4 years ago
You get that just calling the model on the test data in eval mode:
model.eval()
model(test_inputs)
The outcome is a MultivariateNormal
distribution that gives you means and (co-)variance.
@Balandat sorry, its my mistake it should be analogous to predict_f
not predict_y
. please can you inform me which gives me the same output like predict_f
of gpflow library.
model.eval()
model(test_inputs)
will be the predictive distribution for the latent f.
If you want to predict y
, you would do
model.eval()
likelihood(model(test_inputs))
Thank you for the reply but the shape of model(test_inputs)
is (1,4096) where 4096 are the features coming form model and i want shape(1,10) as i have 10 classes to be predicted. Then how to get mean and variance for 10 classes.
Oh oops, I didn't see you were trying to do classification.
If you want the predictive uncertainty for classification, then you want to predict y
. Following the tutorial you linked, you just do
with torch.no_grad(), gpytorch.settings.num_likelihood_samples(16):
model.eval()
y_samples = likelihood(model(test_inputs))
This will give you a Multivariate Normal that's 16 x N x 10
corresponding to 16 samples drawn from the predictive distribution if you pass in N
test_input
points with 10 total classes. To get the average softmax probability of each input and each class, you just average over the first dimension
y_samples.probs.mean(0)
which will give you a N x 10
tensor.
I am doing classification but I want to use a function like used in the link
Please can you check predict_f
of gpflow library. I would like to use similar function which returns the mean and variance of the latent function (f) at the points Xnew for 10 classes
The link you attached is for doing regression.
The latent function for classification does not in general have the same dimensionality as the number of classes. That is why
model(test_inputs)
is (1,4096) since those features are your latent f
for classification.
Actually I need 10 values of mean of the latent function f
rather than one high predictive probability which we get as an output of y_samples.probs.mean(0)
. The link I have mentioned gives (1,10) values for the latent function.
@KeAWang The following is the function used in GPflow library for predict_f
. Please can you inform me which is exact function in GPytorch please. I am thankful for response but I am still very confused.
The posterior variance of F is given by
q(f) = N(f | K alpha + mean, [K^-1 + diag(lambda**2)]^-1)
Here we project this to F*, the values of the GP at Xnew which is given
by
q(F*) = N ( F* | K_{*F} alpha + mean, K_{**} - K_{*f}[K_{ff} +
diag(lambda**-2)]^-1 K_{f*} )
def predict_f(
self, Xnew: InputData, full_cov: bool = False, full_output_cov: bool = False
) -> MeanAndVariance:
Note: This model currently does not allow full output covariances
"""
if full_output_cov:
raise NotImplementedError
X_data, _ = self.data
# compute kernel things
Kx = self.kernel(X_data, Xnew)
K = self.kernel(X_data)
# predictive mean
f_mean = tf.linalg.matmul(Kx, self.q_alpha, transpose_a=True) + self.mean_function(Xnew)
# predictive var
A = K + tf.linalg.diag(tf.transpose(1.0 / tf.square(self.q_lambda)))
L = tf.linalg.cholesky(A)
Kx_tiled = tf.tile(Kx[None, ...], [self.num_latent_gps, 1, 1])
LiKx = tf.linalg.triangular_solve(L, Kx_tiled)
if full_cov:
f_var = self.kernel(Xnew) - tf.linalg.matmul(LiKx, LiKx, transpose_a=True)
else:
f_var = self.kernel(Xnew, full_cov=False) - tf.reduce_sum(tf.square(LiKx), axis=1)
return f_mean, tf.transpose(f_var)
`
model.eval() model(test_inputs)
will be the predictive distribution for the latent f.
If you want to predict
y
, you would domodel.eval() likelihood(model(test_inputs))
@irum Please see @KeAWang's response above. Calling model(test_inputs)
is equivalent to what you want, and is not returning a 1 x 4096
tensor but rather a 1 x 4096
MultivariateNormal
that represents a distribution over 4096 latent functions.
This does not have the same number of outputs as you have classes unless you change your model for this to be the case -- the likelihood mixes an arbitrary set of latent f function values with a weight matrix W that is num_classes x num_latents
-- e.g., 10 x 4096
. There is no particular reason the number of latent GPs should be constrained to be equal to the number of classes, so it isn't.
If you want to draw samples of the mixed latent features, you'll need to multiply samples from this distribution by likelihood.mixing_weights
-- e.g. something like preds.rsample() @ likelihood.mixing_weights.t() # will be num_classes x num_data
, where preds = model(test_inputs)
. See https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/likelihoods/softmax_likelihood.py for where these mixing weights are defined.
Thank you for the reply. I have changed the above mentioned file and usingmixed_fs
for this but its values are very large while it should be between 0 and 1. It gives me big values while doing inference.
if self.mixing_weights is not None:
mixed_fs = function_samples @ self.mixing_weights.t()
(num_classes x num_data)
else:
mixed_fs = function_samples
res = base_distributions.Categorical(logits=mixed_fs)
res.mixed_fs = mixed_fs I have added that and using this value
return res
Why should the values of the logits be between 0 and 1 before they are passed through a softmax?
@jacobrgardner I am getting small values in range 0 and one for mixed_fs
during training phase but when I do inference, I am getting large values. I don’t understand why
I am new to GP . I am looking for the function which returns the mean and variance of the latent function (f) at the points Xnew.
It must be analogous to
Predict _y
Of GPflow library. LinkI am using this [Example] (https://docs.gpytorch.ai/en/v1.2.0/examples/06_PyTorch_NN_Integration_DKL/Deep_Kernel_Learning_DenseNet_CIFAR_Tutorial.html)
The shape of the latent function should be (1,10) if there are 10 classes.