SheffieldML / GPclust

Collapsed Variational Bayes
GNU General Public License v3.0
67 stars 37 forks source link

Different time series for each time course #8

Open lionfish0 opened 8 years ago

lionfish0 commented 8 years ago

I've been looking at how to modify GPClust to handle the situation in which each time course has a different X. The application I'm working on is clustering patients with MND. They have various metrics recorded at irregular intervals (e.g. one person was sampled on day 3,34,64,71,99; another on day 12,54,102,103,120, etc...). I suspect that there are different 'types' of progression and would like to see if I can detect clusters.

I'll also need to look into whether I can add a time offset as a parameter, for each time course (as I don't know when each time course 'starts', i.e. when day zero was for each person). Finally, each person has ~10 different metrics (all recorded together at each interval) - I'll need to look into how to use a multiple-output GP in the clustering framework.

I noticed in MOHGP.py you mention that "#prediction as per my notes" - I'm trying to go from your paper to the code, but if there's some intermediate reasoning somewhere, that would be super helpful!

vals commented 8 years ago

Hey,

I'm not sure how to "align" the time-points using these approaches.

However, I just thought I'd point out that MoHGP clusters output variables. That is, in this case you could see which of your 10 metrics follow similar trends over time.

Your problem is to find groups of patients having their trajectories in these 10 metrics in common. This corresponds more to the OMGP model. It should work for 10D output, but might be slow.

If you have "aligned" times for the patients, you could run OMGP, find the assignments to individual trends for each patient, then see if any particular kind of patient is enriched in one trend compared to another.

jameshensman commented 8 years ago

Thanks for the input @vals !

I have also given some thoughts as to what we should do for unaligned inputs. There seem to be two options

1) it there's not too much mis alignment, i.e. the toal number of time points is relatively small, you could make a dirty hack by filling in the 'missing' time points with zeros, and telling the kernel that these time points have a large amount of noise (using e.g. GPy.kern.Fixed()).

2) Use a sparse GP method, where all of the processes share the same inducing points, which effectively aligns them. I think that's probably not too difficult, but would require some implementation. Happy to provide guidance if needed.

lionfish0 commented 8 years ago

Re OMGP: Thanks vals for the feedback and ideas. I've just had a play with the OMGP examples (the signal/noise separation one is particularly impressive!). I'm not sure however if it quite fits. I feel the knowledge we have about which set of points are of the same time series is worth keeping, which, if I understand correctly, isn't used in OMGP? There may be more structure in the data than is apparent here due to connections between time points. So there might be a cluster of points that e.g. trend downward at a different rate than another cluster, but with a different offset, and so is masked by just fitting a mixture of GPs to the whole set? assessments

Re Aligned Sample Times: Thanks James for the idea. I just realised that the times are restricted to be month starts, which means your option (1) may work very well.

Re More Dimensions: As Y is already an NxT (number of samples x number of times) matrix, I couldn't easily see how to add a new dimension. I've thought of a terrible hack: For 'local' stationary kernels (like RBF, etc) could I put the second dimension as additional observations all at t+1000, in the same one dimensional data set...

Re Different (unknown) start times: I was thinking (based on an idea by Max) of adding a new kernel (say RBF) but with an additional 'offset' parameter that is added to all the Xs (so the kernel would have (y-scale, length-scale, offset) as its parameters? In your framework each time course is modelled by a GP (for the cluster) plus and additional 'corruption' GP, e.g. from your example: k_corruption = GPy.kern.RBF(input_dim=1, variance=0.01, lengthscale=0.1) + GPy.kern.White(1, variance=0.001) but I realise that all time courses have the same corruption kernel... So maybe a new kernel is needed which has as a list of parameters the offsets for all the time series... every time series has its own subdomain of X, which the kernel knows about. Then the kernel can return the correct covariance between pairs of X values (with the addition of the offset). Maybe X could be 2d, so one dimension is the time series id and the other is time. Then the covariance between timepoints i and j in time series n and m, would be kern([i,n], [j,m]) = rbf([i+offset_n,n],[j+offset_m,m]). Where offset_1..offset_N are now kernel parameters (one for each time series, so not very 'parametery' but the whole solution is a hack anyway).

(PS The graph has noise added to mask individual patient values)

lionfish0 commented 8 years ago

Just need a bit of advice. I've just tried out using the Fixed kernel, and I think I've seen this issue before, but I've forgotten what's going on, and I'm wondering if someone's got an explanation.

http://nbviewer.jupyter.org/gist/lionfish0/f243356da422c8f2b59b353f42e4d00d

Why does predict return a list of variances (one for each value in the training data). Also plot doesn't work any more with a Fixed kernel. By picking the top value of the variance list it seems to make a plausible plot (see notebook linked) but I obviously would like to know what's going on!

Thanks,

Mike.

jameshensman commented 8 years ago

hi Mike,

Here's the correct way to plot with that nasty fixed kernel:


k0 = GPy.kern.RBF(input_dim=1, variance=1., lengthscale=1.)
kfix = GPy.kern.Fixed(input_dim=1, covariance_matrix=CM)
newkernel = k0 + kfix
m = GPy.models.GPRegression(X,Y,newkernel)
m.optimize()
m.plot(predict_kw=dict(kern=k0))

The reason the returned variance is weird is because the Fixed kernel is not really a kernel at all: it just returns a 'Fixed' matrix each time you call it. So at predict time, it sends you a matrix of completely the wrong shape (since the kernel does not know whether you're training or predicting!)

This is also why the plotting breaks.

lionfish0 commented 8 years ago

Ah, I understand now! So when I called predict I just have been getting the variance of the test point added to the fixed kernel's returned vector. I've just tried it out. By chance the output of my code had the same CI as the m.plot you used above, but that was just because I had zeros in my Fixed kernel's diagonal at those locations.

Didn't know the cool trick to select which kernels to use for prediction/plotting.

For reference (for future users) it looks like I can do the same with the predict function by using the 'kern' parameter: mean,variance = m.predict(np.array([[x]]),kern=k0)

vals commented 8 years ago

Hey,

Sorry for going on about the OMGP :p, but you want to do the must-link constraints on patients as described here, no? http://www.ece.neu.edu/fac-ece/jdy/papers/ross-dy-ICML2013.pdf

The MRF constraints are not implemented here though..

mathDR commented 8 years ago

Can you model your problem as count data? Then each person wouldn't need the same X. Just a thought. I have a similar problem, but mine is that I have thousands of count data streams representing sales data. That is, a k products are sold at time t is the data set. It is very sparse in that on most days, no product is sold. So I want to model it as counting data, then cluster the GPs that model each count stream.