Choose the right kernel for this noisy training data set?

SheffieldML / GPy

Gaussian processes framework in python

BSD 3-Clause "New" or "Revised" License

2.04k stars 562 forks source link

Choose the right kernel for this noisy training data set? #530

Closed peiyaoli closed 6 years ago

peiyaoli commented 7 years ago

Hi, every one! Currently I am working on Gaussian Process Prediction algorithms, I am a rookie for GP, so I have some questions here: My dataset looks like this: resp_raw First, I started with normalization of data: new_y = (raw_y - mean_y)/std_y Then, I choose Gaussian Likelihood and my kernel setting is : SE_cov + Periodic_cov + noise_cov I used first 1000 points as training set and rest of points for prediction. But the result doesn't look good。 Is this because my data is with high noise? And what I can do to improve? Thanks!

mzwiessele commented 7 years ago

You should not use the white noise kernel, as the likelihood takes care of that. What measure do you use for the test error? What does the GP fit look like? What is the initialization of your lengthscale? Best would be to send this to the email list (see front page) with the code to reproduce a minimalistic example of your fit and error.

adamian commented 7 years ago

For such data an idea is to use a summation of kernels, e.g. SE + Matern3/2, where the SE is initialized with long lengthscale (smooth) and the Matern with short, therefore one capturing the long trend and the other explaining the "wiggliness".

peiyaoli commented 7 years ago

@adamian Thanks for advice! My data is about the respiratory disease prevalence in time. I want to find the relationship between outpatient number and time. We could observe clear periodicity in data and slightly linear trend. I suspect one reason is because outpatient number is discrete, not continuous. That could be one reason explaining "wiggliness". Anyway, I will do as @mzwiessele suggested, send my questions to email list

mzwiessele commented 6 years ago

Also the period of the periodic kernel can be a difficult parameter to learn. Maybe you should put the period close to the expected period at initialization.

kingaza commented 6 years ago

hi @peiyaoli @adamian @mzwiessele ,

I am doing some similar work of respiratory prediction. As I have large scale of data, therefore I split the data set into many pieces for training, i.e. shape = (10000 * 500), in which every sample contains about 5 respiratory cycles. I also want to try kernel of ( SE + Periodic or Matern ). However, the GPy periodic kernels only allow input_dim = 1 Do you have any suggestion? thanks a lot.

Best Regards, kingaza

mzwiessele commented 6 years ago

Which GP model would you like to apply? What is the input and output dimension of your dataset, respectively, please?

On 17 Sep 2018, at 23:59, kingaza notifications@github.com wrote:

hi @peiyaoli @adamian @mzwiessele ,

I am doing some similar work of respiratory prediction. As I have large scale of data, therefore I split the data set into many pieces for training, i.e. shape = (10000 * 500), in which every sample contains about 5 respiratory cycles. I also want to try kernel of ( SE + Periodic or Matern ). However, the GPy periodic kernels only allow input_dim = 1 Do you have any suggestion? thanks a lot.

Best Regards, kingaza

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

kingaza commented 6 years ago

@mzwiessele

My case looks like auto regression, that predict the future about 1 second from history over a long time. Given the sampling frequency is 20Hz, I will use the data of past 20 seconds for training. Therefore, the input and output dimensions are 400 and 20 respectively. Maybe I shall do some channel reduction, i.e. PCA, for the input data, and calculate the period before GP.

Actually I have not decided which kernel I shall apply. I try to follow the CO2 example in chapter 5, GPML, but it seems not work for my case., because the prediction is required to do nearly in real time.

So I would like to ask for any suggestion. thanks again.

Best Regards, kingaza