cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.54k stars 556 forks source link

Multi dimension Target #69

Closed monicaMRL closed 6 years ago

monicaMRL commented 6 years ago

Hi, Thank you for amazing Repo. Is there an example which shows how to work with given code when the input and Target has dimensions like [NxM] where N is features and M number of samples. I checked the ones with multiple dimension inputs but it still has only one dimensional target.

Best, Monica.

gpleiss commented 6 years ago

Hi Monica,

I think the multitask example might be what you're looking for: https://github.com/jrg365/gpytorch/blob/master/examples/multitask_gp_regression.ipynb

In this example, you input should be d + 1 dimensional, where the first d dimensions are the input features, and the last dimension specifies which of the target outputs you're interested in for that sample.

So if your target variable is t dimensional, and you want all t target dimensions for each of the n inputs, then you should clone each of the inputs t times, for a total of n x t inputs.

It's not the best interface... we're thinking about ways to improve this. Let us know if you have any suggestions!

monicaMRL commented 6 years ago

Thank you so much for the answer. I'll try out with the way you suggested and will write back if I am able to come up with different solution.

monicaMRL commented 6 years ago

In the example input is not changed to have d+1 dimensions instead index is passed to forward so should I make input d+1 dimension with last dimension as index? how does it affect training?

Also in output I pass all my target dimension or only one which the input corresponds to? Meaning, my target should look like this --> (my target dim is 5) train_y = np.concatenate([Y[:, 0], Y[:, 1], Y[:, 2], Y[:, 3], Y[:, 4]]) or this, train_y = np.concatenate([Y] * out_dim)

gpleiss commented 6 years ago

Yes, your input should be d+1 dimensional. Alternatively, you can have two input variables: one d dimensional matrix with the features, and one vector with the index.

Your target vector should look like train_y = np.concatenate([Y[:, 0], Y[:, 1], Y[:, 2], Y[:, 3], Y[:, 4]])

Balandat commented 6 years ago

has this been resolved?

Balandat commented 6 years ago

I guess more generally it might be nice to provide a little wrapper for generating the training/target data for the common use case where the observations are at the same points in the feature space for all tasks. cc @rajkumarkarthik

Balandat commented 6 years ago

Thinking more about this, this becomes more important for BayesOpt on the evaluation side. Specifically, we'll want to be able to define (scalar) acquisition functions on t outputs of a multi-task GP at the same x. Currently it seems like the way to do this also in eval mode would be to concatenate t copies of x and generate the associated task index vector.

But to run a gradient descent, we'll want to take the derivative of the acquisition function w.r.t. to x, not w.r.t each of its copies. We could manually do this by summing up the gradients over the copies, but that doesn't seem ideal (code bloat, overhead / efficiency loss, etc.). In particular this would be quite wonky when trying to work in the pytorch optim framework.

@jacobrgardner, @gpleiss any thoughts on this?

Balandat commented 6 years ago

@darbour @bkarrer @bletham

jacobrgardner commented 6 years ago

The Kronecker product multi task example that Geoff started is/will be an example of mapping a single x tensor to multiple outputs. The intent of that notebook is to handle efficiently the setting of multitask gps where every task is evaluated for every point. In that setting at least, I don't foresee any issue here with torch optimization for acquisition functions, since KroneckerProductLazyVariable already handles derivatives through the Kronecker product that is used to map each point to all of the tasks. It sounds like the working version of that does what you need?

In the "messier" setting where some tasks are missing for some points, I don't see a way around essentially just writing a custom kernel that maps n data points to a larger than n x n kernel matrix. Maybe there would be some structure you could exploit, but I think that would depend on the problem.

Balandat commented 6 years ago

Yeah I meant the non-messy setting - haven't had time to go over Geoffs example yet, but I will!

gpleiss commented 6 years ago

I'm proposing a new interface in #209.