SheffieldML / GPy

Gaussian processes framework in python
BSD 3-Clause "New" or "Revised" License
2.03k stars 560 forks source link

GP regression with Y input variance? #196

Closed danjump closed 9 years ago

danjump commented 9 years ago

Hello,

I don't know if this is really the place to ask about this, but I couldn't find any other sort of help forum for GPy. Please direct me elsewhere if there is a better forum for this.

My question is about using input variance in Y. Essentially, I have data points with error bars in Y and it's not clear to me how to use this quantitative variance for each point in the regression. I'm using GPy.models.GPRegression and I have not be able to find documentation to specifically use Y variance for each datapoint as an input argument. I'm certainly not an expert so I don't know if I should expect that to exist or of that's something that's supposed to be included in the kernel is some fashion (as discussed here for example: http://www.robots.ox.ac.uk/~mebden/reports/GPtutorial.pdf). But either way, I'm not sure what is happening in the code in regard to this. I do see the "Gaussian_noise.variance" kernel parameter along with my other RBF parameters, but i'm not sure how exactly this fits in with what I want of inputting known error bars for each data point.

I've pasted the meat of my regression fitting below for context. Any info or reference material you can point me to about this issue would be greatly appreciated. Thanks so much!

define kernel

ker1 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[0]) ker2 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[1])

ker = ker1 * ker2

create simple GP model. coordinates contains x1,x2 points

m = GPy.models.GPRegression(coordinates,values,ker)

optimize and plot

m.optimize(max_f_eval = 1000) print(m)

Name : GP regression Log-likelihood : -3117.71164156 Number of Parameters : 5 Parameters: GP_regression. | Value | Constraint | Prior | Tied to mul.rbf_1.variance | 30.6140441633 | +ve | |
mul.rbf_1.lengthscale | 0.0502088242729 | +ve | |
mul.rbf_2.variance | 30.6140441608 | +ve | |
mul.rbf_2.lengthscale | 0.340895105215 | +ve | |
Gaussian_noise.variance | 7.78585671365 | +ve | |

mzwiessele commented 9 years ago

That is a very good question! I can only say how I would solve it, maybe there is better solutions though.

I would add a fixed kernel, which has the know variances on the diagonal and zero everywhere else.

The inference will take this variance into account and will exclude it in the prediction so it is taken care of.

Can someone think of a better solution?

On 11 May 2015, at 23:30, danjump notifications@github.com wrote:

Hello,

I don't know if this is really the place to ask about this, but I couldn't find any other sort of help forum for GPy. Please direct me elsewhere if there is a better forum for this.

My question is about using input variance in Y. Essentially, I have data points with error bars in Y and it's not clear to me how to use this quantitative variance for each point in the regression. I'm using GPy.models.GPRegression and I have not be able to find documentation to specifically use Y variance for each datapoint as an input argument. I'm certainly not an expert so I don't know if I should expect that to exist or of that's something that's supposed to be included in the kernel is some fashion (as discussed here for example: http://www.robots.ox.ac.uk/~mebden/reports/GPtutorial.pdf). But either way, I'm not sure what is happening in the code in regard to this. I do see the "Gaussian_noise.variance" kernel parameter along with my other RBF parameters, but i'm not sure how exactly this fits in with what I want of inputting known error bars for each data point.

I've pasted the meat of my regression fitting below for context. Any info or reference material you can point me to about this issue would be greatly appreciated. Thanks so much!

define kernel

ker1 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[0]) ker2 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[1])

ker = ker1 * ker2

create simple GP model. coordinates contains x1,x2 points

m = GPy.models.GPRegression(coordinates,values,ker)

optimize and plot

m.optimize(max_f_eval = 1000) print(m)

Name : GP regression Log-likelihood : -3117.71164156 Number of Parameters : 5 Parameters: GP_regression. | Value | Constraint | Prior | Tied to mul.rbf_1.variance | 30.6140441633 | +ve | |

mul.rbf_1.lengthscale | 0.0502088242729 | +ve | |

mul.rbf_2.variance | 30.6140441608 | +ve | |

mul.rbf_2.lengthscale | 0.340895105215 | +ve | |

Gaussian_noise.variance | 7.78585671365 | +ve | |

— Reply to this email directly or view it on GitHub.

danjump commented 9 years ago

Hello,

Sorry for the long delay in getting back to this. Thanks for your answer. The fixed kernel with diagonal values makes sense to me as a solution. Can you educate me further though, how exactly would I implement this?

Would this involve creating a new kernel from scratch (as shown here: http://gpy.readthedocs.org/en/latest/tuto_creating_new_kernels.html)? Or is it something more simple, like using a GPy.kern.bias fixed and scaled to each known variance? And if the later, I'm pretty fuzzy on how one would go about doing this. Or maybe some other already existing kernel all together? In fact i'm not sure if any of those suggestions really make sense. I understand in matrix terms what I want out of the fixed added term, but I don't know how to implement it. Any help here would be greatly appreciated.

Please let me know and thanks so much!

P.S. - One other question: Can you tell me anything about where the "Gaussian_noise.variance" Parameter is coming from? Is there a model option related to this? Is it something I can turn off? I assume this is some addition to the kernel, but I don't understand exactly what? Is it a similar thing to the fixed uncertainty for each point that I'm trying to add, but has a universal value for all points? As it might be related to what I'm trying to do, I'd like to know more about it. Thanks again!

mzwiessele commented 9 years ago

You can just use a Fixed(1, K) kernel, where K has the desired variances on the diagonal.

as of what the Gaussian_noise is, it is the variance component of the likelihood of the GP function p(y|f) = N(y|f, \sigma^2). Gaussian_noise.variance is \sigma^2. You can look up how GPs work in chapter 2 of Rasmussens book http://www.gaussianprocess.org/gpml/, that is only a suggestion, there is a lot of tutorials online and videos from summerschools etc available.

Am 03.06.2015 um 21:22 schrieb danjump notifications@github.com:

Hello,

Sorry for the long delay in getting back to this. Thanks for your answer. The fixed kernel with diagonal values makes sense to me as a solution. Can you educate me further though, how exactly would I implement this?

Would this involve creating a new kernel from scratch (as shown here: http://gpy.readthedocs.org/en/latest/tuto_creating_new_kernels.html)? Or is it something more simple, like using a GPy.kern.bias fixed and scaled to each known variance? And if the later, I'm pretty fuzzy on how one would go about doing this. Or maybe some other already existing kernel all together? In fact i'm not sure if any of those suggestions really make sense. I understand in matrix terms what I want out of the fixed added term, but I don't know how to implement it. Any help here would be greatly appreciated.

Please let me know and thanks so much!

P.S. - One other question: Can you tell me anything about where the "Gaussian_noise.variance" Parameter is coming from? Is there a model option related to this? Is it something I can turn off? I assume this is some addition to the kernel, but I don't understand exactly what? Is it a similar thing to the fixed uncertainty for each point that I'm trying to add, but has a universal value for all points? As it might be related to what I'm trying to do, I'd like to know more about it. Thanks again!

— Reply to this email directly or view it on GitHub.

danjump commented 9 years ago

Hello,

Thanks again for the reply mzwiessele! I wasn't aware of the Fixed() kernel and, in fact, I found that it was not included in my pip install of GPy. I had to manually download updated source from git, and replace my installed GPy/kern directory with the updated GPy/kern directory to have the Fixed() kernel implementation (I suppose i could re-install by adding the full source from git to my path, but I already had it installed through pip so I figured I would just copy over this code I needed. I'm not sure if that's bad form).

I believe I understand how Fixed works in a 1 dimensional regression case(please see my description labeled "Apendix A:" below to confirm I'm understanding things properly), but in my problem I'm attempting to apply gaussian process regression to a 2 dimesnional set of points. Does Fixed() work for 2 input dimesions? I did some digging and i see the following on this documentation site :

GPy.kern.constructors.fixed(input_dim, K, variance=1.0)source Construct a Fixed effect kernel. Parameters:
input_dim (int (input_dim=1 is the only value currently supported)) – the number of input dimensions K (np.array) – the variance \sigma^2 variance (float) – kernel variance

So am I out of luck here? Are there any other options? Or do you think it would be plausable for a non-expert like me to add functionality for 2 input dimensions? (I'd really have to start from scratch understanding how these things are implemented) I'll paste again how i'm constructing my kernel for refrence. Note that ker1 and ker2 have different active_dims values.

define kernel

ker1 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[0]) ker2 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[1])

ker = ker1 * ker2

On a separate issue, again about the Gaussian_noise.variance parameter of the fit, I read through some of Rasmussens book as you suggested. I am still concerned that this parameter is doing exactly what I am trying to do with this added input variance, but wrongly for my case. Specifically, what I understand so far is that the Gaussian_noise.variance parameter corresponds to the "predictions using noisy observations" case as described in Rasmussens chapter 2:

Prediction using Noisy Observations It is typical for more realistic modelling situations that we do not have access to function values themselves, but only noisy versions thereof y = f(x) + ε.^8 Assuming additive independent identically distributed Gaussian noise ε with variance σ^2_n, the prior on the noisy observations becomes cov(yp, yq) = k(xp, xq) + σ^2_nδpq or cov(y) = K(X, X) + σ^2_n I, (2.20) where δpq is a Kronecker delta which is one iff p = q and zero otherwise. It follows from the independence9 assumption about the noise, that a diagonal matrix10 is added, in comparison to the noise free case, eq. (2.16). Introducing the noise term in eq. (2.18) we can write the joint distribution of the observed target values and the function values at the test locations under the prior as

So it seems to me that they are adding a term with constant values of σ^2_n on the diagonal (where I'm assuming this σ^2_n = Gaussian_noise.variance). This is just what I'm trying to do with Fixed() EXCEPT I have known uncertainty values for each point that are NOT the same value for every point. So I think what I want for my application is to essentially turn off the, apparently, automatic assumption of noisy observations where a constant noise parameter is calculated for all points, and instead manually add a term of know uncertainty for each point. I don't know if this is possible, but it'd be great if there was some option i could specify either in my kernel or model definition to do this.

Does this make sense? And is my understanding of the situation accurate? Or am I missing something here, or misunderstanding?

Thanks again very much for your help.

Appendix A: Here is what I understand for a simple 1 dimensional gaussian process regression: The term I want to add to my kernel matrix is produced by Fixed(1,K). It should effectively result in adding a two dimensonal matrix (an NxN, where N is the number of points). The matrix would be all 0's off-diagonal and the diagonal values would be listed in the input vector K (a 1xN dimensional matrix) which for this application would be a list of input uncertainty(variance) for each of the N input data points.

mzwiessele commented 9 years ago

This does make perfect sense. You can just fix the Gaussian_noise to be very small (Gaussian_noise.fix(1e-4)) which effectively switches this parameter off. You can also have a look for heteroscedascic noise, which is a likelihood, which has one noise variance per data point. This is effectively the same as putting the fixed Kernel in.

2015-06-15 1:06 GMT+02:00 danjump notifications@github.com:

Hello,

Thanks again for the reply mzwiessele! I wasn't aware of the Fixed() kernel and, in fact, I found that it was not included in my pip install of GPy. I had to manually download updated source from git, and replace my installed GPy/kern directory with the updated GPy/kern directory have the Fixed() kernel implementation (I suppose i could re-install by adding the full source from git to my path, but I already had it installed through pip so I figured I would just copy over this code I needed. I'm not sure if that's bad form).

I believe I understand how Fixed works in a 1 dimensional regression case(please see my description labeled "Apendix A:" below to confirm I'm understanding things properly), but in my problem I'm attempting to apply gaussian process regression to a 2 dimesnional set of points. Does Fixed() work for 2 input dimesions? I did some digging and i see the following on this documentation site http://gpytest2.readthedocs.org/en/latest/GPy.kern.html :

GPy.kern.constructors.fixed(input_dim, K, variance=1.0)source http://gpytest2.readthedocs.org/en/latest/_modules/GPy/kern/constructors.html#fixed Construct a Fixed effect kernel. Parameters:

input_dim (int (_inputdim=1 is the only value currently supported)) – the number of input dimensions K (np.array) – the variance \sigma^2 variance (float) – kernel variance

So am I out of luck here? Is there any other options? Or do you think it would be plausable for a non-expert like me to add functionality for 2 input dimensions? (I'd really have to start from scratch understanding how these things are implemented) I'll paste again how i'm constructing my kernel for refrence. Note that ker1 and ker2 have different active_dims values.

define kernel

ker1 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[0]) ker2 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[1])

ker = ker1 * ker2

On a separate issue, again about the Gaussian_noise.variance parameter of the fit, I read through some of Rasmussens book as you suggested. I am still concerned that this parameter is doing exactly what I am trying to do with this added input variance, but wrongly for my case. Specifically, what I understand so far is that the Gaussian_noise.variance parameter corresponds to the "predictions using noisy observations" case as described in Rasmussens chapter 2 http://www.gaussianprocess.org/gpml/chapters/RW2.pdf:

Prediction using Noisy Observations It is typical for more realistic modelling situations that we do not have access to function values themselves, but only noisy versions thereof y = f(x) + ε.^8 Assuming additive independent identically distributed Gaussian noise ε with variance σ^2_n, the prior on the noisy observations becomes cov(yp, yq) = k(xp, xq) + σ^2_n_δpq or cov(y) = K(X, X) + σ^2n I, (2.20) where δpq is a Kronecker delta which is one iff p = q and zero otherwise. It follows from the independence9 assumption about the noise, that a diagonal matrix10 is added, in comparison to the noise free case, eq. (2.16). Introducing the noise term in eq. (2.18) we can write the joint distribution of the observed target values and the function values at the test locations under the prior as

So it seems to me that they are adding a term with constant values of σ^2_n on the diagonal (where I'm assuming this σ^2_n = Gaussian_noise.variance). This is just what I'm trying to do with Fixed() EXCEPT I have known uncertainty values for each point that are NOT the same value for every point. So I think what I want for my application is to essentially turn off the, apparently, automatic assumption of noisy observations where a constant noise parameter is calculated for all points, and instead manually add a term of know uncertainty for each point. I don't know if this is possible, but it'd be great if there was some option i could specify either in my kernel or model definition to do this.

Does this make sense? And is my understanding of the situation accurate? Or am I missing something here, or misunderstanding?

Thanks again very much for your help.

Appendix A: Here is what I understand for a simple 1 dimensional gaussian process regression: The term I want to add to my kernel matrix is produced by Fixed(1,K). It should effectively result in adding a two dimensonal matrix (an NxN, where N is the number of points). The matrix would be all 0's off-diagonal and the diagonal values would be listed in the input vector K (a 1xN dimensional matrix) which for this application would be a list of input uncertainty(variance) for each of the N input data points.

— Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/196#issuecomment-111883100.

mzwiessele commented 9 years ago

The fixed kernel should be able to handle more then one dimension, let me know if there is a problem if you try.

2015-06-15 9:12 GMT+02:00 Max Zwiessele ibinbei@gmail.com:

This does make perfect sense. You can just fix the Gaussian_noise to be very small (Gaussian_noise.fix(1e-4)) which effectively switches this parameter off. You can also have a look for heteroscedascic noise, which is a likelihood, which has one noise variance per data point. This is effectively the same as putting the fixed Kernel in.

2015-06-15 1:06 GMT+02:00 danjump notifications@github.com:

Hello,

Thanks again for the reply mzwiessele! I wasn't aware of the Fixed() kernel and, in fact, I found that it was not included in my pip install of GPy. I had to manually download updated source from git, and replace my installed GPy/kern directory with the updated GPy/kern directory have the Fixed() kernel implementation (I suppose i could re-install by adding the full source from git to my path, but I already had it installed through pip so I figured I would just copy over this code I needed. I'm not sure if that's bad form).

I believe I understand how Fixed works in a 1 dimensional regression case(please see my description labeled "Apendix A:" below to confirm I'm understanding things properly), but in my problem I'm attempting to apply gaussian process regression to a 2 dimesnional set of points. Does Fixed() work for 2 input dimesions? I did some digging and i see the following on this documentation site http://gpytest2.readthedocs.org/en/latest/GPy.kern.html :

GPy.kern.constructors.fixed(input_dim, K, variance=1.0)source http://gpytest2.readthedocs.org/en/latest/_modules/GPy/kern/constructors.html#fixed Construct a Fixed effect kernel. Parameters:

input_dim (int (_inputdim=1 is the only value currently supported)) – the number of input dimensions K (np.array) – the variance \sigma^2 variance (float) – kernel variance

So am I out of luck here? Is there any other options? Or do you think it would be plausable for a non-expert like me to add functionality for 2 input dimensions? (I'd really have to start from scratch understanding how these things are implemented) I'll paste again how i'm constructing my kernel for refrence. Note that ker1 and ker2 have different active_dims values.

define kernel

ker1 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[0]) ker2 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[1])

ker = ker1 * ker2

On a separate issue, again about the Gaussian_noise.variance parameter of the fit, I read through some of Rasmussens book as you suggested. I am still concerned that this parameter is doing exactly what I am trying to do with this added input variance, but wrongly for my case. Specifically, what I understand so far is that the Gaussian_noise.variance parameter corresponds to the "predictions using noisy observations" case as described in Rasmussens chapter 2 http://www.gaussianprocess.org/gpml/chapters/RW2.pdf:

Prediction using Noisy Observations It is typical for more realistic modelling situations that we do not have access to function values themselves, but only noisy versions thereof y = f(x) + ε.^8 Assuming additive independent identically distributed Gaussian noise ε with variance σ^2_n, the prior on the noisy observations becomes cov(yp, yq) = k(xp, xq) + σ^2_n_δpq or cov(y) = K(X, X) + σ^2n I, (2.20) where δpq is a Kronecker delta which is one iff p = q and zero otherwise. It follows from the independence9 assumption about the noise, that a diagonal matrix10 is added, in comparison to the noise free case, eq. (2.16). Introducing the noise term in eq. (2.18) we can write the joint distribution of the observed target values and the function values at the test locations under the prior as

So it seems to me that they are adding a term with constant values of σ^2_n on the diagonal (where I'm assuming this σ^2_n = Gaussian_noise.variance). This is just what I'm trying to do with Fixed() EXCEPT I have known uncertainty values for each point that are NOT the same value for every point. So I think what I want for my application is to essentially turn off the, apparently, automatic assumption of noisy observations where a constant noise parameter is calculated for all points, and instead manually add a term of know uncertainty for each point. I don't know if this is possible, but it'd be great if there was some option i could specify either in my kernel or model definition to do this.

Does this make sense? And is my understanding of the situation accurate? Or am I missing something here, or misunderstanding?

Thanks again very much for your help.

Appendix A: Here is what I understand for a simple 1 dimensional gaussian process regression: The term I want to add to my kernel matrix is produced by Fixed(1,K). It should effectively result in adding a two dimensonal matrix (an NxN, where N is the number of points). The matrix would be all 0's off-diagonal and the diagonal values would be listed in the input vector K (a 1xN dimensional matrix) which for this application would be a list of input uncertainty(variance) for each of the N input data points.

— Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/196#issuecomment-111883100.

mzwiessele commented 9 years ago

One last thing: Try using the development version of GPy. It is less stable, but new implementations are to be found here, which can help a lot.

$ git clone git@github.com:SheffieldML/GPy.git $ cd GPy $ git checkout devel $ python setup.py develop

This will link the GPy source to your python environment, so you can later go to youor GPy folder again for any changes made and pull those by calling $ git pull in the GPy directory.

2015-06-15 9:13 GMT+02:00 Max Zwiessele ibinbei@gmail.com:

The fixed kernel should be able to handle more then one dimension, let me know if there is a problem if you try.

2015-06-15 9:12 GMT+02:00 Max Zwiessele ibinbei@gmail.com:

This does make perfect sense. You can just fix the Gaussian_noise to be very small (Gaussian_noise.fix(1e-4)) which effectively switches this parameter off. You can also have a look for heteroscedascic noise, which is a likelihood, which has one noise variance per data point. This is effectively the same as putting the fixed Kernel in.

2015-06-15 1:06 GMT+02:00 danjump notifications@github.com:

Hello,

Thanks again for the reply mzwiessele! I wasn't aware of the Fixed() kernel and, in fact, I found that it was not included in my pip install of GPy. I had to manually download updated source from git, and replace my installed GPy/kern directory with the updated GPy/kern directory have the Fixed() kernel implementation (I suppose i could re-install by adding the full source from git to my path, but I already had it installed through pip so I figured I would just copy over this code I needed. I'm not sure if that's bad form).

I believe I understand how Fixed works in a 1 dimensional regression case(please see my description labeled "Apendix A:" below to confirm I'm understanding things properly), but in my problem I'm attempting to apply gaussian process regression to a 2 dimesnional set of points. Does Fixed() work for 2 input dimesions? I did some digging and i see the following on this documentation site http://gpytest2.readthedocs.org/en/latest/GPy.kern.html :

GPy.kern.constructors.fixed(input_dim, K, variance=1.0)source http://gpytest2.readthedocs.org/en/latest/_modules/GPy/kern/constructors.html#fixed Construct a Fixed effect kernel. Parameters:

input_dim (int (_inputdim=1 is the only value currently supported)) – the number of input dimensions K (np.array) – the variance \sigma^2 variance (float) – kernel variance

So am I out of luck here? Is there any other options? Or do you think it would be plausable for a non-expert like me to add functionality for 2 input dimensions? (I'd really have to start from scratch understanding how these things are implemented) I'll paste again how i'm constructing my kernel for refrence. Note that ker1 and ker2 have different active_dims values.

define kernel

ker1 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[0]) ker2 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[1])

ker = ker1 * ker2

On a separate issue, again about the Gaussian_noise.variance parameter of the fit, I read through some of Rasmussens book as you suggested. I am still concerned that this parameter is doing exactly what I am trying to do with this added input variance, but wrongly for my case. Specifically, what I understand so far is that the Gaussian_noise.variance parameter corresponds to the "predictions using noisy observations" case as described in Rasmussens chapter 2 http://www.gaussianprocess.org/gpml/chapters/RW2.pdf:

Prediction using Noisy Observations It is typical for more realistic modelling situations that we do not have access to function values themselves, but only noisy versions thereof y = f(x)

  • ε.^8 Assuming additive independent identically distributed Gaussian noise ε with variance σ^2_n, the prior on the noisy observations becomes cov(yp, yq) = k(xp, xq) + σ^2_n_δpq or cov(y) = K(X, X) + σ^2n I, (2.20) where δpq is a Kronecker delta which is one iff p = q and zero otherwise. It follows from the independence9 assumption about the noise, that a diagonal matrix10 is added, in comparison to the noise free case, eq. (2.16). Introducing the noise term in eq. (2.18) we can write the joint distribution of the observed target values and the function values at the test locations under the prior as

So it seems to me that they are adding a term with constant values of σ^2_n on the diagonal (where I'm assuming this σ^2_n = Gaussian_noise.variance). This is just what I'm trying to do with Fixed() EXCEPT I have known uncertainty values for each point that are NOT the same value for every point. So I think what I want for my application is to essentially turn off the, apparently, automatic assumption of noisy observations where a constant noise parameter is calculated for all points, and instead manually add a term of know uncertainty for each point. I don't know if this is possible, but it'd be great if there was some option i could specify either in my kernel or model definition to do this.

Does this make sense? And is my understanding of the situation accurate? Or am I missing something here, or misunderstanding?

Thanks again very much for your help.

Appendix A: Here is what I understand for a simple 1 dimensional gaussian process regression: The term I want to add to my kernel matrix is produced by Fixed(1,K). It should effectively result in adding a two dimensonal matrix (an NxN, where N is the number of points). The matrix would be all 0's off-diagonal and the diagonal values would be listed in the input vector K (a 1xN dimensional matrix) which for this application would be a list of input uncertainty(variance) for each of the N input data points.

— Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/196#issuecomment-111883100.

danjump commented 9 years ago

Hi again,

As always, thanks for the quick and helpful replies. I attempted to implement the two input dimension Fixed(2,..) but I'm having issues. First, I am using the development version now as you suggested. It seems like maybe I have the syntax right for the kernel to be doing what I want to do, but when i create the model I get the LinAlgError. Here's example code/output:

print('Input Array Shapes:') print('coordinates',type(coordinates),coordinates.shape) print('values',type(values),values.shape) print('uncert',type(uncert),uncert.shape)

Working kernel before adding fixed term

ker0 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[0]) ker1 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[1]) kernel = ker0 * ker1

m = GPy.models.GPRegression(coordinates,values,kernel)

attempt at adding fixed term. something is wrong when i try to create the model

after adding the fixed term

kerfix = GPy.kern.Fixed(2,uncert) kernel = kernel + kerfix

m = GPy.models.GPRegression(coordinates,values,kernel)

Input Array Shapes: ('coordinates', <type 'numpy.ndarray'>, (1200, 2)) ('values', <type 'numpy.ndarray'>, (1200, 1)) ('uncert', <type 'numpy.ndarray'>, (1200, 1))

LinAlgError Traceback (most recent call last)

in () 15 kernel = kernel + kerfix 16 ---> 17 m = GPy.models.GPRegression(coordinates,values,kernel) ..... LinAlgError: not positive definite, even with jitter.

Can you see any obvious mistakes or direct me to further troubleshooting here? Is this perhaps dependent on my specific data values?

mzwiessele commented 9 years ago

Yes, the fixed kernel needs to be kerfix=GPy.kern.Fixed(2, GPy.util.linalg.tdot(uncert)), where uncert is the known standard deviation (\sigma) of your values.

2015-06-15 10:49 GMT+02:00 danjump notifications@github.com:

Hi again,

As always, thanks for the quick and helpful replies. I attempted to implement the two input dimension Fixed(2,..) but I'm having issues. First, I am using the development version now as you suggested. It seems like maybe I have the syntax right for the kernel to be doing what I want to do, but when i create the model I get the LinAlgError. Here's example code/output:

print('Input Array Shapes:') print('coordinates',type(coordinates),coordinates.shape) print('values',type(values),values.shape) print('uncert',type(uncert),uncert.shape)

Working kernel before adding fixed term

ker0 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[0]) ker1 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[1]) kernel = ker0 * ker1

m = GPy.models.GPRegression(coordinates,values,kernel)

attempt at adding fixed term. something is wrong when i try to create the

model

after adding the fixed term

kerfix = GPy.kern.Fixed(2,uncert) kernel = kernel + kerfix

m = GPy.models.GPRegression(coordinates,values,kernel)

Input Array Shapes: ('coordinates', <type 'numpy.ndarray'>, (1200, 2)) ('values', <type 'numpy.ndarray'>, (1200, 1))

('uncert', <type 'numpy.ndarray'>, (1200, 1))

LinAlgError Traceback (most recent call last)

in () 15 kernel = kernel + kerfix 16 ---> 17 m = GPy.models.GPRegression(coordinates,values,kernel) ..... LinAlgError: not positive definite, even with jitter. Can you see any obvious mistakes or direct me to further troubleshooting here? — Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/196#issuecomment-111981883.
danjump commented 9 years ago

Hello again,

I thought I had everything working finally as I was able to construct and optimize a model successfully, but now i've run in to an error when I try to obtain model predictions and I'm not sure how to troubleshoot the issue. I do not have this problem when running predictions on a model whose kernel does not have the fixed term. But when I use a kernel with the fixed term, It produces an error when running m.predict(). I'll post code and output below. I'd very much appreciate anything you can point out as to what might be going on here.

For the record, I believe i'm using now the non-development version of GPy from git (i.e. a standard build of up-to-date git code).

print('Input Array Shapes:') print('coordinates',type(coords),coords.shape) print('values',type(vals),vals.shape) print('uncert',type(uncertainty),uncertainty.shape)

defien an rbf kernel for each dimension of the distribution

ker0 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[0]) ker1 = GPy.kern.RBF(input_dim=1,variance=100,lengthscale=.05,active_dims=[1])

define a fixed diagonal kernel of the know input variance of the data-points

kerfixed = GPy.kern.Fixed(2,GPy.util.linalg.tdot(uncertainty))

construct final kernel

kernel = ker0 * ker1 + kerfixed

create model

model = GPy.models.GPRegression(coords,vals,kernel)

fix the noise parameter to essentially 0 since we

are using known input variance:

model.Gaussian_noise.fix(1e-6)

model.optimize(max_f_eval = 1000) print(model)

user defined function to generate a list of a 120x200 grid of points at which to obtain predictions

predict_points = get_dw23_wness_coords(1, 120.,-.3,.3, 200.,0.,1.) print('Prediction points shape:'+str(predict_points.shape)) mean, varience = model.predict(predict_points)

Input Array Shapes: ('coordinates', <type 'numpy.ndarray'>, (1200, 2)) ('values', <type 'numpy.ndarray'>, (1200, 1)) ('uncert', <type 'numpy.ndarray'>, (1200, 1)) WARNING: reconstraining parameters GP_regression.Gaussian_noise

Name : GP regression Log-likelihood : -3977.70455704 Number of Parameters : 6 Parameters: GP_regression. | Value | Constraint | Prior | Tied to add.mul.rbf_1.variance | 10.0624849202 | +ve | |
add.mul.rbf_1.lengthscale | 0.0173866088801 | +ve | |
add.mul.rbf_2.variance | 10.0624849177 | +ve | |
add.mul.rbf_2.lengthscale | 0.0178426009088 | +ve | |
add.fixed.variance | 64.6807037111 | +ve | |
Gaussian_noise.variance | 1e-06 | fixed | |
Prediction points shape:(24000, 2)


ValueError Traceback (most recent call last)

in () 28 predict_points = get_dw23_wness_coords(1, 120.,-.3,.3, 200.,0.,1.) 29 print('Prediction points shape:'+str(predict_points.shape)) ---> 30 mean, varience = model.predict(predict_points) /usr/local/lib/python2.7/dist-packages/GPy-0.6.1-py2.7.egg/GPy/core/gp.pyc in predict(self, Xnew, >>full_cov, Y_metadata, kern) 216 """ 217 #predict the latent function values --> 218 mu, var = self._raw_predict(Xnew, full_cov=full_cov, kern=kern) 219 if self.normalizer is not None: 220 mu, var = self.normalizer.inverse_mean(mu), self.normalizer.inverse_variance(var) /usr/local/lib/python2.7/dist-packages/GPy-0.6.1-py2.7.egg/GPy/core/gp.pyc in _raw_predict(self, >>_Xnew, full_cov, kern) 180 kern = self.kern 181 --> 182 Kx = kern.K(_Xnew, self.X).T 183 WiKx = np.dot(self.posterior.woodbury_inv, Kx) 184 mu = np.dot(Kx.T, self.posterior.woodbury_vector) /usr/local/lib/python2.7/dist-packages/GPy-0.6.1->>py2.7.egg/GPy/kern/_src/kernel_slice_operations.pyc in wrap(self, X, X2, _a, *_kw) 63 def wrap(self, X, X2 = None, _a, *_kw): 64 with _Slice_wrap(self, X, X2) as s: ---> 65 ret = f(self, s.X, s.X2, _a, *_kw) 66 return ret 67 return wrap /usr/local/lib/python2.7/dist-packages/GPy-0.6.1-py2.7.egg/GPy/util/caching.pyc in __call__(self, >>_args, *_kwargs) 180 except KeyError: 181 cacher = caches[self.f] = Cacher(self.f, self.limit, self.ignore_args, self.force_kwargs) --> 182 return cacher(_args, *_kwargs) 183 184 class Cache_this(object): /usr/local/lib/python2.7/dist-packages/GPy-0.6.1-py2.7.egg/GPy/util/caching.pyc in __call__(self, >>_args, *_kw) 106 #print 'WARNING: '+self.operation.__name__ + ' not cacheable!' 107 #print [not (isinstance(b, Observable)) for b in inputs] --> 108 return self.operation(_args, *_kw) 109 # 3&4: check whether this cache_id has been cached, then has it changed? 110 try: /usr/local/lib/python2.7/dist-packages/GPy-0.6.1-py2.7.egg/GPy/kern/_src/add.pyc in K(self, X, X2, >>which_parts) 36 # if only one part is given 37 which_parts = [which_parts] ---> 38 return reduce(np.add, (p.K(X, X2) for p in which_parts)) 39 40 @Cache_this(limit=2, force_kwargs=['which_parts']) ValueError: operands could not be broadcast together with shapes (24000,1200) (1200,1200)
mzwiessele commented 9 years ago

that is indeed expected. the fixed kernel is only for inference. try predicting using only the function kernel, i.e. m.predict(xpred, kernel=ker0.copy() * ker1.copy()). this should give you the right predictions. make sure you use the kernels of the model, as the hyperparameters got learnt. you can also take the kernels from the model directly, but make copies before multiplying, as otherwise their hierarchy will get messed up...

Am 22.06.2015 um 21:06 schrieb danjump notifications@github.com:

ker0 * ker1

rocreguant commented 9 years ago

First of all thanks mzwiessele for this explanations. It has been already a couple of days trying to figure out my problem without any luck; therefore I would like to follow this thread.

This approach, using the fixed kernel, gives me very small variances (~10 times smaller than I think they should be). Without the fixed kernel GPy returns identical variances for all the range where I have the data points. The dataset is kind of particular, it has 8 values Y for a given X (X is the time, and Y are the measurements). The Y data is usually below 1 (but positive in all cases). The fixed kernel has the variances on the diagonal. The variance is computed based on the 8 values for a given X. And those 8 values do have the same variance in the kernel matrix (I did compute the variance manually).

If anyone has any ideas where the problem might be I would be highly appreciated. This is the code:

kernel = GPy.kern.RBF(input_dim=1, variance=3.5, lengthscale=1.) noise = GPy.kern.Fixed(1, varis) #varis is the covariance matrix kern = kernel + noise #I've tried multiplication just in case, but it doesn't work

m = GPy.models.GPRegression(X, Y, kern) m.optimize(messages=True, optimizer='scg') #without the optimization does provide a too big variance

prediction = m.predict(X_t, kern=kernel.copy()) quant = m.predict_quantiles(X_t, kern=kernel.copy())

mzwiessele commented 9 years ago

I cannot see the problem with that. It is all good and right. Maybe you want to use optimizer='bfgs' though, it is usually better.

So the question is, what is your varis? How did you compute it again? It might very well be, that varis explains a lot of variance, if it is the empirical covariance. This means, the kernel (RBF) has not much variance left to explain the data.

2015-08-13 16:55 GMT+01:00 Roc Reguant notifications@github.com:

First of all thanks mzwiessele for this explanations. It has been already a couple of days trying to figure out my problem without any luck; therefore I would like to follow this thread.

This approach, using the fixed kernel, gives me very small variances (~10 times smaller than I think they should be). Without the fixed kernel GPy returns identical variances for all the range where I have the data points. The dataset is kind of particular, it has 8 values Y for a given X (X is the time, and Y are the measurements). The Y data is usually below 1 (but positive in all cases). The fixed kernel has the variances on the diagonal. The variance is computed based on the 8 values for a given X. And those 8 values do have the same variance in the kernel matrix (I did compute the variance manually).

If anyone has any ideas where the problem might be I would be highly appreciated. This is the code: kernel = GPy.kern.RBF(input_dim=1, variance=3.5, lengthscale=1.) noise = GPy.kern.Fixed(1, varis) #varis is the covariance matrix kern = kernel + noise #I've tried multiplication just in case, but it doesn't work

m = GPy.models.GPRegression(X, Y, kern) m.optimize(messages=True, optimizer='scg') #without the optimization does provide a too big variance

prediction = m.predict(X_t, kern=kernel.copy()) quant = m.predict_quantiles(X_t, kern=kernel.copy())

— Reply to this email directly or view it on GitHub https://github.com/SheffieldML/GPy/issues/196#issuecomment-130736854.

rocreguant commented 9 years ago

I've changed the optimizer and it provides the same solution.

varis is a matrix filled with 0 except in the diagonal where the variance is. My data is Y[x] = {y0, y1, y2, y3, y4, y5, y6, y7}. Then for every Y[x] I look for its variance Var and place it in diagonal corresponding to all the Y[x] values causing that Var is repeated 8 times in varis. For the record, all the x are evenly distributed, X = {0,1,2,3, ..., N}.

So assuming that I only have two X and the number of Y instead of 8 is 2 varis would look like: [[ VarA 0 0 0] [ 0 VarB 0 0] [ 0 0 VarA 0] [ 0 0 0 VarB]]

Var = \sum (yi - µ)^2 /n (n in the dataset is 8, here would be 2) The confidence interval I compute it based on the variance prediction given squared times 1.96

I've tried different kernels (Mat52, and Mat32) and does not work. How would you suggest to compute this diagonal? Or what would you suggest to do instead?

jameshensman commented 9 years ago

Please continue this interesting discussion on the user mailing list. thanks.

https://lists.shef.ac.uk/sympa/subscribe/gpy-users