Understanding Optimization of parameters in GPy.core.SVGP

Homesh01 commented 8 years ago

Hi,

I'm using GPy.core.SVGP for large datasets; I'm trying to understand the implementation better. Here is code from your tutorials;

N=5000
X = np.random.rand(N)[:, None]
Y1 = np.sin(6*X) + 0.1*np.random.randn(N,1)
Y2 = np.sin(3*X) + 0.1*np.random.randn(N,1)
Y = np.hstack((Y1, Y2))
Z = np.random.rand(20,1)

batchsize = 100
m = GPy.core.SVGP(X, Y, Z, GPy.kern.RBF(1) + GPy.kern.White(1), GPy.likelihoods.Gaussian(), batchsize=batchsize)
m.kern.white.variance = 1e-5
#m.kern.white.fix()

This makes sense. However when I investigate the number of hyperparameters which will be optimized in the above model I use the following:

print(m.optimizer_array.shape)
print(m.kern.param_array.shape)

and get:

(484,)
(3,)

There are 3 parameters (2 from the rbf kernel and one from the white noise). Combine that with the 20 inducing points to give 23. So I expected 23 hyperparameters would be optimized over. Can someone explain why 484 hyperparameters are been optimized?

Kind Regards,

Homesh

lawrennd commented 8 years ago

In stochastic variational inference you optimise over the distribution q(u) also (which has a mean and a covariance)

Neil

On Wed, Apr 27, 2016 at 8:05 AM, IrishBeast1 notifications@github.com wrote:

Hi,

I'm using GPy.core.SVGP for large datasets; I'm trying to understand the implementation better. Here is code from your tutorials; `import numpy as np import GPy from matplotlib import pyplot as plt import climin

N=5000 X = np.random.rand(N)[:, None] Y1 = np.sin(6_X) + 0.1_np.random.randn(N,1) Y2 = np.sin(3_X) + 0.1_np.random.randn(N,1) Y = np.hstack((Y1, Y2))

Z = np.random.rand(20,1)

batchsize = 100 m = GPy.core.SVGP(X, Y, Z, GPy.kern.RBF(1) + GPy.kern.White(1), GPy.likelihoods.Gaussian(), batchsize=batchsize) m.kern.white.variance = 1e-5

m.kern.white.fix()`

This makes sense. However When I investigate the number of hyperparameters which will be optimized in the above model I use the following: print(m.optimizer_array.shape print(m.kern.param_array.shape) and get: (484,) (3,)

There are 3 parameters (2 from the rbf kernel and one from the white noise). Combine that with the 20 inducing points to give 23. So I expected 23 hyperparameters would be optimized over. Can someone explain why 484 hyperparameters are been optimized?

Kind Regards,

Homesh

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/381

Homesh01 commented 8 years ago

Of course, thanks for clarifying that.

On Wed, Apr 27, 2016 at 1:13 PM, Neil Lawrence notifications@github.com wrote:

In stochastic variational inference you optimise over the distribution q(u) also (which has a mean and a covariance)

Neil

On Wed, Apr 27, 2016 at 8:05 AM, IrishBeast1 notifications@github.com wrote:

Hi,

I'm using GPy.core.SVGP for large datasets; I'm trying to understand the implementation better. Here is code from your tutorials; `import numpy as np import GPy from matplotlib import pyplot as plt import climin

N=5000 X = np.random.rand(N)[:, None] Y1 = np.sin(6_X) + 0.1_np.random.randn(N,1) Y2 = np.sin(3_X) + 0.1_np.random.randn(N,1) Y = np.hstack((Y1, Y2))

Z = np.random.rand(20,1)

batchsize = 100 m = GPy.core.SVGP(X, Y, Z, GPy.kern.RBF(1) + GPy.kern.White(1), GPy.likelihoods.Gaussian(), batchsize=batchsize) m.kern.white.variance = 1e-5

m.kern.white.fix()`

This makes sense. However When I investigate the number of hyperparameters which will be optimized in the above model I use the following: print(m.optimizer_array.shape print(m.kern.param_array.shape) and get: (484,) (3,)

There are 3 parameters (2 from the rbf kernel and one from the white noise). Combine that with the 20 inducing points to give 23. So I expected 23 hyperparameters would be optimized over. Can someone explain why 484 hyperparameters are been optimized?

Kind Regards,

Homesh

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/381

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/381#issuecomment-215063729

Homesh01 commented 8 years ago

But we also optimize over the positions of the inducing points (they are not fixed like previous methods)?

vahidbas commented 8 years ago

I spent a lot of time understanding the implementation of SVGP class. Apart from kernel parameters it ties to optimize Z, q_u_mean and q_u_chol which are inducing variables, mean of variational distribution q(u) and lower part of Cholesky decomposition of the covariance matrix of q(u) . In your case because the function is 1-->2 dimensions with 20 inducing variables, you will have 20 parameters for Z, 2x20 for q_u_mean and 2x(20x20/2+20/2) for q_u_chol which in total is 480 . the rest are from the kernel.

jorgeloaiza commented 6 years ago

Hello. Can anybody help me with a error about GPY? Thanks

SheffieldML / GPy

Understanding Optimization of parameters in GPy.core.SVGP #381

m.kern.white.fix()`

m.kern.white.fix()`