cornellius-gp / gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch
MIT License
3.54k stars 556 forks source link

How to use a fixed noise Gaussian likelihood in a multi-task setting #901

Open Galto2000 opened 4 years ago

Galto2000 commented 4 years ago

Howdy folks,

GPyTorch provides Gaussian likelihood objects for fixed noise (FixedNoiseGaussianLikelihood) and for multi-task models (MultitaskGaussianLikelihood). I was wondering if someone could provide me some guidance on how to get a fixed noise multi-task Gaussian likelihood?

Thanks in advance

Galto

jacobrgardner commented 4 years ago

@Galto2000 I think we'd just need to implement FixedNoiseMultitaskGaussianLikelihood. Basically, you'd specify an n x t matrix of noises rather than a length n vector of noises, and the interface would otherwise be the same.

Galto2000 commented 4 years ago

@jacobrgardner , would you please give a little more guidance, perhaps something like a high level recipe, on how I would go about implementing something like a FixedNoiseMultitaskGaussianLikelihood :) ?

Also, if you have some time, would you also please provide me with some clarification regarding my other issue (https://github.com/cornellius-gp/gpytorch/issues/890) - it's kind of related to this one.

I feel that the chips are starting to fall into place, but I just need an extra nudge from you and I think this would be a great challenge for me to wrap my head around some of the implementation details in GpyTorch.

Thanks in advance

Galto

Balandat commented 4 years ago

This is something that I'd like as well, let me see if I can find some time to work on this this week.

Galto2000 commented 4 years ago

@Balandat, that would be great, thank you.

Balandat commented 4 years ago

@Galto2000 I'm assuming what you'd like to do here is provide the noise for the different tasks, but not the cross-task covariance - this should still be inferred. Is this correct?

Galto2000 commented 4 years ago

@Balandat , yes, I think that is correct.

For instance, I am interested in doing multi-task, multi-sensor fusion; i.e. condition a model posterior on observations from different types of sensors (each of which has different noise) where the sensors output vector quantities, which makes it multi-task.

Thank you

Galto

Balandat commented 4 years ago

FWIW, I put up an early draft for this in 49e810b1a4f29fe1e0a102ad6f5963e90ae0dbdd - will have to do some cleaning up and testing before I make this a PR.

Balandat commented 4 years ago

Hmm, I'm realizing that the fact that MultitaskMultivariateNormal can be using either interleaved or non-interleaved representation significantly complicates things here. It'll take a little bit of work to iron this out.

Balandat commented 4 years ago

It seems that we should address #539 first in order to make this less of a pain to implement.

Galto2000 commented 4 years ago

I'm trying to wrap my head around the interleaved concept as well on how multi-tasking is achieved in GPyTorch. Is there any relevant literature that I can reference ?

Balandat commented 4 years ago

yeah basically if you have n points and t tasks, gpytorch represents the joint covariance as an nt x nt matrix. You can represent that in different ways, either K_{data} \kron K_{task}, in which case you have n t x t matrices on the diagonal (i.e. "interleaved" w.r.t the data points), or K_{task} \kron K_{data}, in which case you have t n x n matrices on the diagonal. Depending on the use case one or the other representation may make more sense, hence the suggestion in #539.

Here K_{data} is the data covariance that depends on the hyperparameters, and K_{task} is a learned (often low-rank) correlation matrix. See #912 for some changes to the parameterization of that.

Galto2000 commented 4 years ago

Hi @Balandat ,

One of my goals is to do "sensor fusion" (or data fusion) using GPs. In my case I have two different sensors measuring the same vector entity (a velocity in 2D). The sensors have different noise characteristics: sigma1 and sigma2.

I read a paper (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.1154&rep=rep1&type=pdf) that solves this from a heteroskedastic point of view.

I found your notebook with a heteroskedastic example: test_HadamardMultitaskMultiOutputHeteroskedasticLikelihood_universal.ipynb.txt

Firstly, I am trying to wrap my head around the example: mostly the last one where a GP is passed to the noise_covar of a likelihood - if you have some time could you perhaps explain this to me?

Secondly: if you have some time, would you perhaps have some suggestions on how I would be able achieve "sensor fusion" by treating my two different data sets as a heteroskedastic data set, where the noise levels are quantized: data with noise corresponding to the first sensor and data with noise corresponding to the second sensor?

Thanks in advance

Galto

Galto2000 commented 4 years ago

actually I meant this one: test_MultitaskHeteroskedasticLikelihood.ipynb.txt The very last example in the series

Galto2000 commented 4 years ago

Hi @Balandat ,

In regards to the fixed noise Gaussian likelihood in a multi-task setting, I saw your draft - how do get these changes? I pip-installed GPyTorch - do I need to clone from github now?

Cheers

Galto

Balandat commented 4 years ago

@Galto2000 sorry I haven't gotten to work much on this - the draft isn't really in a usable state at this point, so unless you plan on actively developing it's probably not worth checking it out (which you would do by cloning the repo and checking out that branch). I'll try to get back to this soon-ish.

actually I meant this one: test_MultitaskHeteroskedasticLikelihood.ipynb.txt The very last example in the series.

Sorry what series do you mean exactly? Can you link to this?

Firstly, I am trying to wrap my head around the example: mostly the last one where a GP is passed to the noise_covar of a likelihood - if you have some time could you perhaps explain this to me?

It's pretty straightforward: if you have noise observations you can build a separate noise model. Typically this is fit on log-transformed data to ensure positivity and model multiplicative uncertainty. Then the prediction of that model at the input X is used as the noise level (rather than using fixed noises or a constant one). This has two benefits: (i) regularize the noise levels, in case these are themselves subject to observation noise [which they typically will be] and (ii) allow out-of-sample noise predictions, which is important for some more advanced acquisition functions in Bayesian Optimization. We have such a model checked in in BoTorch: https://github.com/pytorch/botorch/blob/master/botorch/models/gp_regression.py#L224

Galto2000 commented 4 years ago

Thanks @Balandat for your reply.

I was referring to this little bit of code, that you posted some time ago:

train_x = torch.linspace(0, 1, 75)

sem_y1 = 0.05 + (0.75 - 0.05) * torch.linspace(0, 1, 75)
sem_y2 = 0.75 - (0.75 - 0.05) * torch.linspace(0, 1, 75)

train_y = torch.stack([
    torch.sin(train_x * (2 * math.pi)) + sem_y1 * torch.randn(train_x.size()),
    torch.cos(train_x * (2 * math.pi)) + sem_y2 * torch.randn(train_x.size()),
], -1)

train_y_log_var = torch.stack([(s ** 2).log() for s in (sem_y1, sem_y2)], -1)

log_noise_model = MultitaskGPModel(
    train_x,
    train_y_log_var,
    MultitaskGaussianLikelihood(num_tasks=2),
    num_tasks=2,
)

likelihood = _MultitaskGaussianLikelihoodBase(
    num_tasks=2,
    noise_covar=HeteroskedasticNoise(log_noise_model),
)
model = MultitaskGPModel(train_x, train_y, likelihood, num_tasks=2, rank=2)

I was wondering that I could do something analogous to get around the issue of not yet having MultitaskFixedGaussianNoise available.

In my case I have two observations of Y, over X, but at two different noise levels.
So I have observations y1 overx1 with known noise n1 and observationsy2 over x2 with known noise n2, and as such I concatenate or stack the tensors as follows Y = [y1, y2], X=[x1,x2] and N = [n1, n2]

Now, pass N and X to a GP model with a linear kernel and pass that as noise_covar in a _MultitaskGaussianLikelihoodBase that will serve as the likelihood for a GP model that takes Y and X as their inputs. Multi-sensor fusion using GPs is my goal here.

It's going to be less computationally efficient than a MultitaskFixedGaussianNoise, but at this time that wouldn't bother me since the data is relatively small and it would be temporary until MultitaskFixedGaussianNoise comes online.

You see any issues with this approach?

Cheers

Galto

Galto2000 commented 4 years ago

Hello, happy new year!

I was wondering if there is an ETA for the MultitaskFixedGaussianNoise ?

I tried the "heteroskedastic approach", in order to instil some fixed noise behavior in a multi-task setting, but there are many issues with doing it that way.

I am currently circumventing not having a MultitaskFixedGaussianNoise through using model-lists and (single task) FixedGaussianNoise and assuming the outcomes are independent, in order to make progress and in the hope that when MultiTaskFixedGaussianNoise comes available it would be relatively simple change at the end.

Cheers

Galto

eytan commented 4 years ago

I know that Max is out for the next week. BoTorch has support for MTGPs with fixed noise... would something like https://botorch.org/v/0.1.0/api/models.html#fixednoisemultitaskgp help?

On Mon, Jan 6, 2020 at 7:46 AM Galto2000 notifications@github.com wrote:

Hello, happy new year!

I was wondering if there is an ETA for the MultitaskFixedGaussianNoise ?

I tried the "heteroskedastic approach", in order to instil some fixed noise behavior in a multi-task setting, but there are many issues with doing it that way.

I am currently circumventing not having a MultitaskFixedGaussianNoise through using model-lists and (single task) FixedGaussianNoise and assuming the outcomes are independent, in order to make progress and in the hope that when MultiTaskFixedGaussianNoise comes available it would be relatively simple change at the end.

Cheers

Galto

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cornellius-gp/gpytorch/issues/901?email_source=notifications&email_token=AAAW34IJJYUFRECHRW2BWM3Q4NG3TA5CNFSM4JA4PBV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIF2U5Q#issuecomment-571189878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAW34LEUKWMDNZ7GCVQRQ3Q4NG3TANCNFSM4JA4PBVQ .

Galto2000 commented 4 years ago

@eytan

Thanks for pointing out the BoTorch fixed noise multitask - I'll check it out.

florpi commented 3 years ago

Are there any news on this thread? It'd be a useful thing to have!

Balandat commented 3 years ago

@wjmaddox, @qingfeng10 I know you are/were thinking about this in as well. Are you working on this / planning to work on this in the near future?

wjmaddox commented 3 years ago

It depends on the type of noise that's desired here. For a model that will have missing observations for some tasks, then it's probably preferable to just use a botorch FixedNoiseMTGP.

It is possible (although it will be slow for large n) to just drop in a fixed noise gaussian likelihood directly into a MTGP as this code snippet:

import torch
import math

from botorch.models import KroneckerMultiTaskGP
from gpytorch.likelihoods import FixedNoiseGaussianLikelihood

train_x = torch.linspace(0, 1, 75)

sem_y1 = 0.05 + (0.75 - 0.05) * torch.linspace(0, 1, 75)
sem_y2 = 0.75 - (0.75 - 0.05) * torch.linspace(0, 1, 75)

train_y = torch.stack([
    torch.sin(train_x * (2 * math.pi)) + sem_y1 * torch.randn(train_x.size()),
    torch.cos(train_x * (2 * math.pi)) + sem_y2 * torch.randn(train_x.size()),
], -1)

train_y_log_var = torch.stack([(s ** 2).log() for s in (sem_y1, sem_y2)], -1)

likelihood = FixedNoiseGaussianLikelihood(train_y_log_var.exp().view(-1))
# KroneckerMultiTaskGP is basically the same model class as a MultiTaskGP
# in the gpytorch example
mtgp = KroneckerMultiTaskGP(
    train_x.unsqueeze(-1),
    train_y,
    num_tasks=2,
)
mtgp.likelihood = likelihood 
# NOTE that i didn't check interleaving here to see if this is returning the correct noise
# just verifying that this implementation works
# if you need to interleave, should be able to transpose and then squeeze

mtgp.likelihood(mtgp(train_x))
# returns MultitaskMultivariateNormal(loc: torch.Size([150]))

# can train with something like below
from botorch.optim.fit import fit_gpytorch_torch
from gpytorch.mlls import ExactMarginalLogLikelihood

mll = ExactMarginalLogLikelihood(mtgp.likelihood, mtgp)
fit_gpytorch_torch(mll)
arodland commented 2 years ago

I have a multitask problem where the data is transformed so that the noise is the same on all of the tasks and I can reasonably assume cross-task noise = 0. Is there a simple way that I can wrap FixedNoiseGaussianLikelihood to make it play nice with multitask in this context?

wjmaddox commented 2 years ago

Yes, that's basically what's done with BoTorch's FixedNoiseMTGP (https://botorch.org/v/0.1.0/api/models.html#fixednoisemultitaskgp).

Alternatively, if you really need the Kronecker structure, you could just set the likelihood noise to be what you want it to be and detach it:

from gpytorch.likelihoods import MultitaskGaussianLikelihood

likelihood = MultitaskGaussianLikelihood(num_tasks=4, likelihood_rank=0)
likelihood.noise = 0.1 # for example
likelihood.task_noises = torch.tensor([1., 1., 1., 1.]) 
# for example, although both of these need to be non-zero
likelihood.raw_noise.detach_()
likelihood.raw_task_noises.detach_()

The example a couple of comments above is for noise that is potentially different across tasks and observations.

arodland commented 2 years ago

@wjmaddox Sorry, I meant that the noise is per-observation, but the same across tasks for a given observation. Appreciate the help though!

wjmaddox commented 2 years ago

Are you observing all tasks for each observation?

arodland commented 2 years ago

@wjmaddox yes.

wjmaddox commented 2 years ago

Something like this ought to work for you (and I probably ought to clean this up as a PR at some point):

# using botorch.models.KroneckerMultitaskGP here but the API should be the same
# for a MTGP like in https://docs.gpytorch.ai/en/stable/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html

from botorch.models import KroneckerMultiTaskGP

train_x = torch.randn(10, 2)
train_y = torch.randn(10, 4)
train_y_var = torch.rand(10).exp()

from gpytorch.likelihoods.multitask_gaussian_likelihood import _MultitaskGaussianLikelihoodBase
from gpytorch.likelihoods.noise_models import FixedGaussianNoise
from gpytorch.lazy import ConstantDiagLazyTensor, KroneckerProductLazyTensor

class FixedTaskNoiseMultitaskLikelihood(_MultitaskGaussianLikelihoodBase):
    def __init__(self, noise, *args, **kwargs):
        noise_covar = FixedGaussianNoise(noise=noise)
        super().__init__(noise_covar=noise_covar, *args, **kwargs)
        self.has_global_noise = False
        self.has_task_noise = False

    def _shaped_noise_covar(self, shape, add_noise=True, *params, **kwargs):
        if not self.has_task_noise:
            data_noise = self.noise_covar(*params, shape=torch.Size((shape[:-2],)), **kwargs)
            eye = torch.ones(1, device=data_noise.device, dtype=data_noise.dtype)
            # TODO: add in a shape for batched models
            task_noise = ConstantDiagLazyTensor(
                eye, diag_shape=torch.Size((self.num_tasks,))
            )
            return KroneckerProductLazyTensor(data_noise, task_noise)
        else:
            # TODO: copy over pieces from MultitaskGaussianLikelihood
            raise NotImplementedError("Task noises not supported yet.")

# setup is that the covariance is `D \kron I` where `D` is user supplied
likelihood = FixedTaskNoiseMultitaskLikelihood(num_tasks=4, noise=train_y_var, rank=0)
model = KroneckerMultiTaskGP(
    train_x, 
    train_y, 
    likelihood=likelihood
)

# now test the posterior
test_x = torch.randn(20, 2)

model.eval()
model(test_x).rsample((torch.Size((32,)))).shape

Let me know if that has issues or needs to be expanded somehow.

JFagin commented 1 year ago

Has anyone solved this problem?