STOR-i / GaussianProcesses.jl

A Julia package for Gaussian Processes
https://stor-i.github.io/GaussianProcesses.jl/latest/
Other
308 stars 53 forks source link

Predicting whole time series from parameters #104

Closed ludoro closed 5 years ago

ludoro commented 5 years ago

Hello everyone!

I am trying to use this Package, but I have some questions. What I am trying to do is predict the time series of the Lotka-Volterra system of differential equation. That is, I solve many times the problem (using JuliaDiffeq) with random coefficients A,B,C,D and build a dataset of the time series. Then, I want to use the GP to predict the values of the time series from the 4 parameters without actually solving the system.

The notebook on time series is of some help, but I am not sure how to actually build the kernel. Moreover, can the X in the GP() function be a data frame matrix or just a matrix in general?

Sorry if the questions are too basic but I am just starting out.

It would be amazing if I could get some hints! 👍

maximerischard commented 5 years ago

Hi ludoro,

It's exciting to hear you're interested in using this package for your work. If I'm understanding you correctly, you have a low-dimensional input (the coefficients) and a high-dimensional output (a time-series). So then, what exactly are you trying to predict: the entire time-series, or the next observation based on the first n observations and the coefficients? If it's the full time-series, there's some thinking to do about using GPs to model high-dimensional outputs (it's not straightforward to my knowledge). There's a lot of directions this could go into.

Generically, to build a kernel, you can use components (like the array of stationary kernels available; periodic kernels may also be useful to your application), and combine them by adding or multiplying them. You can use masked kernels (so they only apply to some dimensions of your input X), and fixed kernels to prevent a parameter from being modified by an optimization.

An alternative would be to think of this as a high-dimensional input combining the coefficients and the grid of times. Let's say you have p coefficients and T times, your input space would be of size pT, and your output would then be univariate. You might want the kernel to be stationary in the time dimension, and non-stationary in the coefficients dimensions. Lots to think about!

X can be any kind of AbstractMatrix. DataFrames are not abstract matrices, but a DataFrame containing numbers can easily be converted into a matrix. If you want to be able to keep the column names, the AxisArrays.jl library could be of interest.

Obviously, I don't know exactly what you're trying to do, but let us know if you need more help using the package. Feel free to reopen the issue with more questions.

ludoro commented 5 years ago

Thank you for your interest! I thought about it a little more, and I got to your same conclusion: I should use as input both the parameters and a time t, to reduce the size of my output. However, I have two more questions. My kernel should be a function of both time and parameters, right? Talking with other people I got suggested to use product kernel. The second question: using this approach, I still need to get two outputs, the value of x[t] and the value of y[t]. However I think I could do two GPs, one for the x's and one for the y's. Could this be the correct approach?

maximerischard commented 5 years ago

Talking with other people I got suggested to use product kernel.

I agree that seems like the right approach.

using this approach, I still need to get two outputs, the value of x[t] and the value of y[t]. However I think I could do two GPs, one for the x's and one for the y's. Could this be the correct approach?

That's interesting. I think it would work as two separate GPs, but it might take more simulation runs to get good predictions. It might be that a multivariate GP would be much more efficient, but unfortunately our package doesn't have good support for multivariate outputs yet. If I were you, I would try with separate GPs first and decide if it's worth the effort to try a multivariate GP afterwards.

ludoro commented 5 years ago

Perfect! Thank you very much.