Closed Homesh01 closed 7 years ago
In stochastic variational inference you optimise over the distribution q(u) also (which has a mean and a covariance)
Neil
On Wed, Apr 27, 2016 at 8:05 AM, IrishBeast1 notifications@github.com wrote:
Hi,
I'm using GPy.core.SVGP for large datasets; I'm trying to understand the implementation better. Here is code from your tutorials; `import numpy as np import GPy from matplotlib import pyplot as plt import climin
N=5000 X = np.random.rand(N)[:, None] Y1 = np.sin(6_X) + 0.1_np.random.randn(N,1) Y2 = np.sin(3_X) + 0.1_np.random.randn(N,1) Y = np.hstack((Y1, Y2))
Z = np.random.rand(20,1)
batchsize = 100 m = GPy.core.SVGP(X, Y, Z, GPy.kern.RBF(1) + GPy.kern.White(1), GPy.likelihoods.Gaussian(), batchsize=batchsize) m.kern.white.variance = 1e-5
m.kern.white.fix()`
This makes sense. However When I investigate the number of hyperparameters which will be optimized in the above model I use the following: print(m.optimizer_array.shape print(m.kern.param_array.shape) and get: (484,) (3,)
There are 3 parameters (2 from the rbf kernel and one from the white noise). Combine that with the 20 inducing points to give 23. So I expected 23 hyperparameters would be optimized over. Can someone explain why 484 hyperparameters are been optimized?
Kind Regards,
Homesh
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/381
Of course, thanks for clarifying that.
On Wed, Apr 27, 2016 at 1:13 PM, Neil Lawrence notifications@github.com wrote:
In stochastic variational inference you optimise over the distribution q(u) also (which has a mean and a covariance)
Neil
On Wed, Apr 27, 2016 at 8:05 AM, IrishBeast1 notifications@github.com wrote:
Hi,
I'm using GPy.core.SVGP for large datasets; I'm trying to understand the implementation better. Here is code from your tutorials; `import numpy as np import GPy from matplotlib import pyplot as plt import climin
N=5000 X = np.random.rand(N)[:, None] Y1 = np.sin(6_X) + 0.1_np.random.randn(N,1) Y2 = np.sin(3_X) + 0.1_np.random.randn(N,1) Y = np.hstack((Y1, Y2))
Z = np.random.rand(20,1)
batchsize = 100 m = GPy.core.SVGP(X, Y, Z, GPy.kern.RBF(1) + GPy.kern.White(1), GPy.likelihoods.Gaussian(), batchsize=batchsize) m.kern.white.variance = 1e-5
m.kern.white.fix()`
This makes sense. However When I investigate the number of hyperparameters which will be optimized in the above model I use the following: print(m.optimizer_array.shape print(m.kern.param_array.shape) and get: (484,) (3,)
There are 3 parameters (2 from the rbf kernel and one from the white noise). Combine that with the 20 inducing points to give 23. So I expected 23 hyperparameters would be optimized over. Can someone explain why 484 hyperparameters are been optimized?
Kind Regards,
Homesh
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/381
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/381#issuecomment-215063729
But we also optimize over the positions of the inducing points (they are not fixed like previous methods)?
I spent a lot of time understanding the implementation of SVGP class. Apart from kernel parameters it ties to optimize Z
, q_u_mean
and q_u_chol
which are inducing variables, mean of variational distribution q(u) and lower part of Cholesky decomposition of the covariance matrix of q(u) . In your case because the function is 1-->2 dimensions with 20 inducing variables, you will have 20 parameters for Z
, 2x20 for q_u_mean
and 2x(20x20/2+20/2) for q_u_chol
which in total is 480 . the rest are from the kernel.
Hello. Can anybody help me with a error about GPY? Thanks
Hi,
I'm using GPy.core.SVGP for large datasets; I'm trying to understand the implementation better. Here is code from your tutorials;
This makes sense. However when I investigate the number of hyperparameters which will be optimized in the above model I use the following:
and get:
There are 3 parameters (2 from the rbf kernel and one from the white noise). Combine that with the 20 inducing points to give 23. So I expected 23 hyperparameters would be optimized over. Can someone explain why 484 hyperparameters are been optimized?
Kind Regards,
Homesh