dfm / tinygp

The tiniest of Gaussian Process libraries
https://tinygp.readthedocs.io
MIT License
296 stars 24 forks source link

Multi-dimensional Output data? #220

Open RKHashmani opened 2 months ago

RKHashmani commented 2 months ago

Hello. Does TinyGP support multi-dimensional output data? For example a 1D input to a 2D output, etc?

dfm commented 1 month ago

It's certainly possible to use tinygp for multidimensional outputs, but it doesn't have particularly strong opinions about how to do it. Some examples from the docs include:

but there are many other possible approaches. Hope this helps!

RKHashmani commented 1 month ago

It's certainly possible to use tinygp for multidimensional outputs, but it doesn't have particularly strong opinions about how to do it. Some examples from the docs include:

but there are many other possible approaches. Hope this helps!

Thanks for the resources! I've been studying them for a while and I'm still finding a bit of difficulty adapting them. Suppose I start with the same conditioning data as in your first tutorial, however, I don't want to use the derivation of kernels (since my actual output won't be derivations of each other). I tried to build a kernel similar to "DerivativeKernel" class, but my means (mu1 and mu2) end up being identical. I've created a minimal example here:

https://gist.github.com/RKHashmani/a85208705781ac395b24742eecf64b61

Any suggestions on how to tweak this?

I've gone through Chapter 9.1 of Rasmussen. They suggest independent GPs, a correlated noise process, or a process like cokriging. I can do the independent GPs fairly easily, but then I lose any possibility that my multivariate output (say y and z) is correlated. I'm not sure how correlated noise process or cokriging would work in TinyGP.

Would you have any guidance on the matter again?

dfm commented 1 month ago

Sure. Here's how I would update your example using the method from Teh et al. (2005) which is referenced from chapter 9 of GPML: https://gist.github.com/dfm/afa49c75900ca9e5a8468bd55745e06b

The trick is to work through the Kronecker products for their kernel to work out how to evaluate it for a given pair of inputs. This is the same idea discussed in this tutorial, but extended to work with general tinygp kernels.

Hope this helps!

RKHashmani commented 1 day ago

Hi Dan!

Thanks a lot for the example you shared. It took a while but I was able to incorporate it into my project and expand on it to work with any arbitrary output dimension. I have another small question (hopefully the last one!).

Building off of your example, you conditioned the GP twice, once for one output dimension and again for the second output:

# Predict the function values for the function and its derivative
mu1 = gp.condition(y, (X_grid, np.zeros(len(X_grid), dtype=int))).gp.loc
mu2 = gp.condition(y, (X_grid, np.ones(len(X_grid), dtype=int))).gp.loc

Suppose I want to sample from these GPs. I modified your code a bit to get:

# Modified this part to get conditioned GPs
_, gp_0 = gp.condition(y, (X_grid, np.zeros(len(X_grid), dtype=int)))
_, gp_1 = gp.condition(y, (X_grid, np.ones(len(X_grid), dtype=int)))

# Sample the GPs:
seed = np.random.default_rng().integers(0, 2 ** 32)
sample0 = gp_0.sample(key=jax.random.PRNGKey(seed), shape=(1,))[0]
sample1 = gp_1.sample(key=jax.random.PRNGKey(seed), shape=(1,))[0]

Then, if I wanted to calculate the log probability of getting these 2 samples from our conditioned GP, how would I go about doing that? Would I simply do this:

# Calculating the log probability of the samples with respect to the conditioned GPs.

log_prob0 = gp_0.log_probability(sample0)
log_prob1 = gp_1.log_probability(sample1)
total_log_prob = log_prob0 + log_prob1

This feels a bit wrong because, to the best of my understanding, adding log_probabilities like this requires that the 2 probabilities should be independent. But in our case, they aren't independent because of the mixture of kernels process we did beforehand. How would you go about finding the total log probability in this case?

I've created an example here: https://gist.github.com/RKHashmani/2581d366e53e9fa8dd3efc738f12ae9b

To help clarify the end goal: I'm imagining a setup where I have an unconditioned GP conditioned to 1 set of ys (like we did above to get gp_0 and gp_1) and then another unconditioned GP conditioned to a different set of ys (e.g. to get gp_a and gp_b). If I have a sample of ys (sample_0 and sample_1), I'd like to be able to compare the log probabilities given sample_0 and sample_1 for both the first set of gps(gp_0 and gp_1) (like I did above) and the second set (gp_a and gp_b).

P.S. I apologize for the delayed replies on my end. I'm a PhD student myself so sometimes it gets a bit difficult to juggle things. I usually like to know if I have everything understood before I close a topic, so it takes me a while to get to the point where I realize I have to ask yet another question. Thanks for your responses!