Confusing documentation on hyperparameters

cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch

MIT License

3.53k stars 556 forks source link

Confusing documentation on hyperparameters #1413

Closed BrunoMorabito closed 3 years ago

BrunoMorabito commented 3 years ago

I find the documentation on the hyperparameters a bit confusing. For example, I quote

"The most important thing to note here is that the actual learned parameters of the model are things like raw_noise, raw_outputscale, raw_lengthscale, etc. "

so it sounds like the actual parameters are the raw parameters but it then continues with:

"The reason for this is that these parameters must be positive. This brings us to our next topic for parameters: constraints, and the difference between raw parameters and actual parameters."

that sounds like there is a difference between the two.

At the end of the day, I am interested in getting the learned parameters because I want to use them in my covariance function, but I still don't get where I find them

KeAWang commented 3 years ago

This notebook explains the difference between raw parameters and parameters: https://github.com/cornellius-gp/gpytorch/blob/master/examples/00_Basic_Usage/Hyperparameters.ipynb

We do a positive transformation of the raw unconstrained parameters to obtain the positive parameters. Usually the transform is the soft plus function log(1+e^x).

BrunoMorabito commented 3 years ago

Thanks for the quick answer. The notebook is the same used for creating the documentation though, so I am still a bit confused.

My understanding is that you run an unconstrained optimization to find the hyperparameters and then you make sure that the parameters respect the constraints by applying some transformation, is that correct? So the transformed parameters are the one actually entering the kernel.

gpleiss commented 3 years ago

My understanding is that you run an unconstrained optimization to find the hyperparameters and then you make sure that the parameters respect the constraints by applying some transformation, is that correct? So the transformed parameters are the one actually entering the kernel.

Yes. That is correct. Gradient-based optimization is applied to the (unconstrained) raw_lengthscale. kernel.lengthscale is equal to softplus(kernel.raw_lengthscale).

BrunoMorabito commented 3 years ago

great, thanks!