Deep Kernel Learning w/ Custom Loss

cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch

MIT License

3.58k stars 562 forks source link

Deep Kernel Learning w/ Custom Loss #522

Closed kayuksel closed 5 years ago

kayuksel commented 5 years ago

I have a custom loss function that utilizes the whole batch to come up with a loss. What would be the best way to utilize Deep Kernel Learning in this setting? I understand that it is very easy to wrap an existing model to obtain mean and variances of a prediction, and use those along with given loss functions for classification and regression. Is there also such easy way to wrap an existing custom loss function so that the loss calculation returns a mean and variance of the sampled predictions? Note: I actually am planning to use this in a reinforcement learning setting to optimize a policy. Is there simple a way to have the model consider uncertainties of the model (or activations) while selecting an action?

jacobrgardner commented 5 years ago

I'm not totally sure I understand what you want to do. You want to use the model predictions in eval mode in a loss computation? If yes, this should be easy enough as you can back-propagate through both the mean and any covariance entries of the GP (including the variances). Just make a PyTorch optimizer with the parameters you want to optimize, compute whatever loss you need from the predictions and optimize.

The only caveat when doing this is that if as part of your training the training data or GP hyperparameters change in anyway, you'll need to be sure to clear the test time caches. If you update the training data with model.set_train_data this already happens for you. Otherwise, you can either (a) switch the model back in to train mode and then back in to eval mode, or (b) include a model.prediction_strategy = None.

If I didn't fully understand what you want to do, maybe you could elaborate further?

gpleiss commented 5 years ago

Closing this for now, but @kayuksel feel free to reopen if you have any more questions 😄

kayuksel commented 5 years ago

I apologize for not being responsive. What I couldn't fully figure out was how to calculate a custom loss.

gpleiss commented 5 years ago

I'm not sure if there's an easy way to wrap an existing loss. What you need to do is to define a Likelihood. The likelihood will take in samples from the gaussian processes and return a distribution. (For example, in the case of regression - it returns a Gaussian distribution. In the case of classification, it returns a Bernoulli distribution).

All DKL models have to be optimized with respect to the variational lower bound as their loss function. The Likelihood of the model is how you define what the data should look like.

kayuksel commented 5 years ago

@gpleiss I see. It would be great to have an automatic way to wrap an existing loss though. Basically, it could sample random inputs from the given distribution and call an existing loss function with those random samples, and then give back a distribution for the loss function. If it is a reward for example, one could normalize or standardize those rewards according to the calculated distribution.

gpleiss commented 5 years ago

@kayuksel - what you're proposing would only work for probabilistic loss functions. For example, MSE loss, cross entropy classification loss, etc. would all work (though those correspond to GaussianLikelihood and BernoulliLikelihood). Non-probablistic losses - like the hinge loss for example - do not correspond to a likelihood function, so it's unclear that VI is appropriate in these cases.

Additionally, SV-DKL models are supposed to output predictive distributions (rather than point predictions), and the particular probability distribution is defined by the likelihood function. So while you could wrap certain loss functions for training, you would still have to define the likelihood's probability distribution for inference.

cherepanovic commented 5 years ago

All DKL models have to be optimized with respect to the variational lower bound as their loss function.

@gpleiss could you advice me to read about the loss requirements in terms of the GP. Thanks a lot

gpleiss commented 5 years ago

@cherepanovic - I would check out the SV-DKL paper first.