Need heteroscedastic noise (`eps`) parameter

bwpriest commented 1 year ago

We need to support vector-valued eps parameters. Similar to MuyGPyS.gp.kernel.SigmaSq, we should break eps out of MuyGPyS.gp.kernel.Hyperparameter and make it its own class. We probably only want to support learning the epsilon parameter in the homoscedastic case, though, so we will need to maintain some guardrails to make sure that it is treated as a scalar where appropriate and as a vector where appropriate.

bwpriest commented 1 year ago

Incorporating heteroscedastic noise into the MuyGPyS.gp.muygps.MuyGPS api will be fairly straightforward. For example, we can remove the eps parameter from

https://github.com/LLNL/MuyGPyS/blob/d10849d0de0c8bf9a59a1193242f32cd036a9602/MuyGPyS/_src/gp/muygps/numpy.py#L9-L19

Instead, we will invoke it from MuyGPyS.gp.muygps.MuyGPS._compute_solve with a new _src function _homoscedastic_perturb()

return _muygps_compute_solve(
    _homoscedastic_perturb(K, eps),
    Kcross,
    batch_nn_targets
)

This new function will have the (numpy) form

def _homoscedastic_perturb(K, eps):
    _, nn_count, _ = K.shape
    return K + eps * np.eye(nn_count)

We can formulate a similar heteroscadistic perturbation that instead of a scalar eps, takes an eps of shape (batch_count, nn_count, nn_count) where the last two dimensions form diagonal matrices via

def _heteroscedastic_perturb(K, eps):
    return K + eps

This could instead use a dense (batch_count, nn_count) matrix eps and define:

def _heteroscedastic_perturb(K, eps):
    ret = K.copy()
    batch_count, nn_count, _ = K.shape
    indices = (
        np.repeat(range(batch_count), nn_count),
        np.tile(np.arange(nn_count), batch_count),
        np.tile(np.arange(nn_count), batch_count),
    )
    ret[indices] += eps.flatten()
    return ret

This last method uses less memory, but is also cumbersome. It would be nice for np.diagonal to return a writable view, but that promise does not seem to be a priority at the moment.

bwpriest commented 1 year ago

We will also need to modify the MuyGPyS.gp.muygps.MuyGPS functions that depend upon perturbed kernels so that they can distinguish between homo- and heteroscedastic perturbation. Adding a kwarg (nugget=None or similar) is probably the simplest solution. If nugget=None, do the homoscedastic workflow. If nugget is a (batch_count, nn_count) array, instead do the heteroscedastic workflow, passing nugget to the _heteroscedastic_perturb function described above.

bwpriest commented 1 year ago

We will need to add infrastructure to create this nugget matrix from data. It will be very similar in form to the way that we create the nn_targets tensor in make_*_tensors. It will take a (train_count,) vector of noise measurements and produce a (batch_count, nn_count) matrix of the noise measurements associated with the nearest neighbors of each of the batch elements.

bwpriest commented 1 year ago

It would be nice to avoid the conditionals that will arise in the workflow that we've described above. I believe that we can do so in training via modifying the make_opt_func() methods scattered throughout the code. However, it is less clear how to do so during prediction. This should not matter much except in the case where we need to rapidly and sequentially predict, like in a reinforcement learning setting. I'm not sure how to best handle this without blowing up the API. Maybe we add a make_predict_func() method that one can optionally use to formulate such a function so that we can avoid the repeated conditional evaluations?

bwpriest commented 1 year ago

@alecmdunton I tried summarizing our discussion from this afternoon.

bwpriest commented 1 year ago

PR #80 takes the first step here by explicitly separating out the nugget computation in the math functions. It will be a little easier now (read: possible) to add heteroscedastic noise behavior. We still need to refactor model optimization so that we don't iteratively copy large heteroscedastic noise vectors when we optimize.

bwpriest commented 1 year ago

This last method uses less memory, but is also cumbersome. It would be nice for np.diagonal to return a writable view, but that promise does not seem to be a priority at the moment.

I think that it will be best to use (batch_count, nn_count, nn_count) heteroscedastic noise parameters for now. The (batch_count, nn_count) solution requires either that np.diagonal returns a writable view (not happening in the near future), or that we copy K, which will cost more time. The simpler method will also be easier to read. We can update this solution if we find a better way to implement it.

alecmdunton commented 1 year ago

In order to make this work I think I am going to have to feed nn_indices information into the muygps class as well as the posterior_mean and posterior_variance classes. I have gotten the code to work during training so far, but there is an issue during prediction. It seems that we need to be able to feed the muygps object an array containing the measurement noise for every training point, and then have access to nn_indices within the mean and variance functions in order to build the correct kernel matrix depending on whether we are training or predicting.

bwpriest commented 1 year ago

I am going to have to feed nn_indices information into the muygps class as well as the posterior_mean and posterior_variance classes.

That will not work, because all of the index stuff has to happen prior to tensor creation, which happens prior to the creation and use of the MuyGPS object. This is because, as you may recall, the indices are meaningless in distributed memory. The MuyGPS class doesn't, and can't, reason about global indices. Instead, we have to create a heteroscedastic noise tensor using index information and feed that to the MuyGPS functions. This tensor eps will have shape (batch_count, nn_count), such that eps[i, j] holds the noise prior for batch element i's jth nearest neighbor.

We will need a MuyGPyS.gp.tensors function that creates eps from the (train_count,) noise vector and batch_nn_indices. We will also need a _heteroscedastic_perturb math function that adds the heteroscedastic noise to the kernel tensor appropriately. We'll also need logic in PosteriorMean and PosteriorVariance to apply eps to their functions, similar to how we're applying homoscedastic noise now. The heteroscedastic noise should be wrapped up in a HeteroscedasticNoise class with a similar interface to HomoscedasticNoise, except that it wraps a matrix rather than a scalar.

alecmdunton commented 1 year ago

That makes so much more sense. I have most of the other pieces you mentioned implemented.

bwpriest commented 1 year ago

We will also of course need test cases to verify that everything works, and that the different implementations agree

alecmdunton commented 1 year ago

Thanks for pointing me in that direction - I was banging my head against a wall for a while.

bwpriest commented 1 year ago

This is related to issue #105. We currently deepcopy the MuyGPS object during optimization functions to avoid side effects. That's fine right now, but once we have a heteroscedastic noise model that becomes expensive. We're instead need a method that shallowly copies references to fixed parameters, while deepcopying free parameters. That should be straightforward but let me know if you have any trouble.

alecmdunton commented 1 year ago

I am currently able to train a heteroscedastic MuyGPS model on my feature branch. The issue I am dealing with now is prediction. In order to provide the correct nugget tensor for both training and prediction, we need to have the measurement noise for both batch_nn_indices and test_nn_indices in the workflow. I am currently providing a (batch_count, nn_count) shaped tensor as a hyperparameter in the MuyGPs model. Should I edit the signatures of function like regress_any and regress_from_indices so that they require measurement noise for each point in test_nn_indices and build the necessary heteroscedastic tensor within those functions?

bwpriest commented 1 year ago

You should not need measurement noise for anything in test. There is a measurement noise prior associated with each train element, which is used to perturb the train covariance. For a conventional GP, this looks like $$\mu(Z \mid X) = K(Z, X) (K(X, X) + \epsilon(X) I)^{-1} * Y,$$ where $\epsilon(X)$ is the vector of noise measurements associated with training data $X$. There is no $\epsilon(Z)$.

bwpriest commented 1 year ago

You will need separate eps tensors for training versus prediction. For training, you need a (batch_count, nn_count) tensor, whereas for testing you need a (test_count, nn_count) tensor. Whatever function you use to generate the first you should also be able to generate the second. See, e.g., how _pairwise_tensor is used both for building train and predict tensors.

alecmdunton commented 1 year ago

Do we want to feed a (train_count) length array containing measurement noise for every training point as a hyperparameter to the MuyGPs class so that both of those tensors can be constructed using a MuyGPs object?

bwpriest commented 1 year ago

Do we want to feed a (train_count) length array as a hyperparameter to the MuyGPs class so that both of those tensors can be constructed using a MuyGPs object?

No. I believe that it makes a lot more sense to move anything involving the application of indices outside of the MuyGPS class so that all of its functions have tensors -> tensor signatures. This is related to my instructions to remove references to indices from MuyGPS.build_fast_posterior_mean_coeffs in iss #106.

bwpriest commented 1 year ago

Should I edit the signatures of function like regress_any and regress_from_indices so that they require measurement noise for each point in test_nn_indices...

I am opposed to continuing to maintain these examples due to the overhead involved. The more complexity we add, the more unreadable they become (and the harder the example functions become to maintain!). It would be better to create an example notebook that walks through the workflow.

alecmdunton commented 1 year ago

We would still want functions e.g., optimize_from_tensors to have a measurement noise tensor eps_nn_targets corresponding to batch_nn_targets as an argument, correct?

bwpriest commented 1 year ago

It would be better to create an example notebook that walks through the workflow.

Also tests that cover all of the relevant behavior!

bwpriest commented 1 year ago

We would still want functions e.g., optimize_from_tensors to have a measurement noise tensor eps_nn_targets corresponding to batch_nn_targets as an argument, correct?

No. I do not consider anything in MuyGPyS.examples to be a "part of the library", so we should not keep updating the existing functions. In fact, at some point we need to go through and massively simplify what is in there. It is my stance that going forward that everything in MuyGPyS.examples is meant to be read as tutorials, but never run in anger.

bwpriest commented 1 year ago

Providing one-liner interfaces was a trap that we've wasted way too much effort trying to maintain. There is a reason that sklearn, pytorch, tensorflow, etc don't do that.

bwpriest commented 1 year ago

everything in MuyGPyS.examples is meant to be read as tutorials, but never run in anger.

We should also remove most of the references to MuyGPyS.examples from the test chassis at some future point.

bwpriest commented 1 year ago

I've condensed some of my thoughts and plans relating to the future of MuyGPyS.examples in issue #108.

alecmdunton commented 1 year ago

So it's pretty clear to me how to construct a heteroscedastic muygps object and train it: we provide the object with a (batch_count, nn_count) tensor of measurement noises for the nearest neighbors of each batch point. I have this up and running. When we have trained the model and need to predict, we need some sort of function that will take as input the (test_count, nn_count) tensor of measurement noises for the nearest neighbors of each test point.

A lot of my confusion was coming from the fact that I am debugging my code by using MuyGPyS.examples, so I can stop doing that right now.

My earlier question about optimize_from_tensors refers to the MuyGPyS.optimize.chassis module. I do think we need to (1) edit the signature of this function or (2) when we build a muygps object we set muygps.eps to a (batch_count, nn_count) tensor prior to training and then set muygps.eps to a (test_count, nn_count) tensor prior to prediction.

bwpriest commented 1 year ago

The (batch_count, nn_count) noise tensor should be wrapped in a HeteroscedasticNoise object that is a member of the muygps object (muygps.eps), so we needn't modify the signature of optimize_from_tensors. Furthermore, we might not even know the test data at the time of optimization, so we cannot include anything related to a (test_count, nn_count) noise tensor.

You do raise a good point. When using HomoscedasticNoise, the MuyGPS object returned from optimize_from_tensors is the same object that you use to predict. Around the time that we create the test_crosswise_diffs and test_pairwise_diffs tensors, we also need to create the new noise tensor and use it to modify the optimized MuyGPS object.

I'm torn as to how we do this. Currently, all functions are pure functions with no side effects, so when we optimize a MuyGPS we actually make a new object and return that, leaving the old one unmodified. However, this is a slightly different use case where we want to swap the noise tensor used for training for the one to be used with prediction. We could write a member function that does this, so it would be as simple as

muygps = optimize_from_tensors(...)
new_noise = make_noise_tensor(test, train, nbrs_lookup)  # notional names
muygps.eps.set(new_noise)

However, we don't want to user to actually do this anywhere else, and it could cause unforseen issues. Ergo, I think it would be better to write a function change_noise or similar that is used like

muygps = optimize_from_tensors(...)
new_noise = make_noise_tensor(test, train, nbrs_lookup)  # notional names
muygps = apply_new_noise(muygps, new_noise)

Internally, apply_new_noise makes a new MuyGPS object, applies new_noise, and returns that. This has the same problem as optimize_from_tensors and muygpys_analytic_sigma_sq_optim, where the only way to do this at the moment is to deepcopy the passed MuyGPS object at what is (now, with heteroscedastic noise) large cost. So we can do this for now, but we will really want a solution to iss #105 to avoid all of these unnecessary tensor copies.

bwpriest commented 1 year ago

We could also write something like:

def modify_optim(optim_fn, new_noise):
    def new_optim_fn(*args, **kwargs):
        muygps = optim_fn(*args, **kwargs)
        muygps.eps.set(new_noise)
        return muygps
    return new_optim_fn

This avoids a copy and makes things easier on the user, but requires that we know the test information at optimization time, which we cannot assume. So I think that we have to do something along the lines of the above.

alecmdunton commented 1 year ago

Prototype is working. Once I place the make_noise_tensor and apply_new_noise functions appropriately within the library we will have an untested heteroscedasticity feature.

alecmdunton commented 1 year ago

Heteroscedasticity working. Going to clean up code and write tests.

alecmdunton commented 1 year ago

In terms of test coverage, do you think a HeteroscedasticNoise(GPTestCase) class in tests/gp.py and backend tests intorch_correctness.py, jax_correctness.py, and mpi_correctness.py is sufficient? These are the files where I found references to the HomoscedasticNoise class.

bwpriest commented 1 year ago

that is probably fine. We want something that uses a heteroscedastic noise model during optimization, but tests/optimize.py is a little busted right now.

LLNL / MuyGPyS

Need heteroscedastic noise (`eps`) parameter #70