Open gsimchoni opened 2 years ago
Chiming in here, I'm pretty sure the implementation that @vr308 contributed can learn a distribution over X through a variational approximation: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/models/gplvm/latent_variable.py#L65
that's correct @holmrenser but the testing framework is missing - the model works as a "one-way street". You can learn latent variables for high-d training observations but when new observations arrive - there is no way to learn the latent vars for those.
I have written the code for that but its a bit of work to integrate it into the existing gpytorch model. But yes, its on my plate and I am looking to merge it soon, hopefully before end November.
@gsimchoni is right in identifying that the testing framework is missing - the main reason is that its just not so straight forward.
Awesome, thanks for clarifying! Am I wrong when I think that the testing idea could be implemented with backconstraints a la Lawrence and Quinonero-Candela 2006?
Hi all,
picking back up this discussion. I have been playing around with the partially observed data testing framework that aims to infer the unobserved values of y_star
given the x_star
learned parameters through the observed values of y_star
. Here's the general approach that I am using:
1) provide full y_train
to train the B-GPLVM and learn local and global variational parameters as well as GP hyperparameters.
2) use y_star_obs
(observed test values) to initialise x_star
in the latent space in the same position as the latent variable that best matches the observed values in the Y space.
3) parametrise q(x_star)
, ie Parameter(mu_star)
, Parameter(sigma_star)
.
4) re-optimise ELBO with respect to the new parameters only. ie Adam = torch.optim.Adam([ {'params': model.X.mu_star}, {'params': model.X.sigma_star}, ], lr=0.1)
does this sound a reasonable approach? Happy to provide some code example if there's no apparent mistake.
Andrea
🚀 Feature Request
As @vr308 pointed out here, the current implementation of GPLVM "doesn't learn a distribution over X (there is a prior over X) but X is learnt as a model parameter". If I understand correctly, this is missing Section 4 of the Titsias & Lawrence 2010 BGPVLM paper, "Prediction and computation of probabilities in test data", and for a given test data
y_star
we cannot "predict"X_star|y_star
, let alone reconstructy_star|X_star
, e.g. to get reconstruction error.Motivation
Is your feature request related to a problem? Please describe.
Without the ability to predict the LV based on unseen test data and build reconstructions, as with other dimensionality reduction methods like PCA and VAE, this model is limited. How can we assess its ability to generalize to new test data?
Pitch
Describe the solution you'd like
Perhaps differentiate between a
Reduction
andReductionAndPrediciton
strategies, and/orParameterLatentVariable
andLearnedLatentVariable
classes.Describe alternatives you've considered
If I understood correctly there aren't alternatives (other than treating grouping together the train and test data and perform the model on them, which counteracts the generalization goal, or find similar observations in the train data, which is terrible)
Are you willing to open a pull request? (We LOVE contributions!!!) Willing but not the best pytorch programmer, afraid I will break things... :flushed:
Additional context