Closed bwpriest closed 1 year ago
Incorporating heteroscedastic noise into the MuyGPyS.gp.muygps.MuyGPS
api will be fairly straightforward. For example, we can remove the eps
parameter from
Instead, we will invoke it from MuyGPyS.gp.muygps.MuyGPS._compute_solve
with a new _src function _homoscedastic_perturb()
return _muygps_compute_solve(
_homoscedastic_perturb(K, eps),
Kcross,
batch_nn_targets
)
This new function will have the (numpy) form
def _homoscedastic_perturb(K, eps):
_, nn_count, _ = K.shape
return K + eps * np.eye(nn_count)
We can formulate a similar heteroscadistic perturbation that instead of a scalar eps
, takes an eps
of shape (batch_count, nn_count, nn_count)
where the last two dimensions form diagonal matrices via
def _heteroscedastic_perturb(K, eps):
return K + eps
This could instead use a dense (batch_count, nn_count)
matrix eps
and define:
def _heteroscedastic_perturb(K, eps):
ret = K.copy()
batch_count, nn_count, _ = K.shape
indices = (
np.repeat(range(batch_count), nn_count),
np.tile(np.arange(nn_count), batch_count),
np.tile(np.arange(nn_count), batch_count),
)
ret[indices] += eps.flatten()
return ret
This last method uses less memory, but is also cumbersome. It would be nice for np.diagonal
to return a writable view, but that promise does not seem to be a priority at the moment.
We will also need to modify the MuyGPyS.gp.muygps.MuyGPS
functions that depend upon perturbed kernels so that they can distinguish between homo- and heteroscedastic perturbation. Adding a kwarg (nugget=None
or similar) is probably the simplest solution. If nugget=None
, do the homoscedastic workflow. If nugget
is a (batch_count, nn_count)
array, instead do the heteroscedastic workflow, passing nugget
to the _heteroscedastic_perturb
function described above.
We will need to add infrastructure to create this nugget
matrix from data. It will be very similar in form to the way that we create the nn_targets
tensor in make_*_tensors
. It will take a (train_count,)
vector of noise measurements and produce a (batch_count, nn_count)
matrix of the noise measurements associated with the nearest neighbors of each of the batch elements.
It would be nice to avoid the conditionals that will arise in the workflow that we've described above. I believe that we can do so in training via modifying the make_opt_func()
methods scattered throughout the code. However, it is less clear how to do so during prediction. This should not matter much except in the case where we need to rapidly and sequentially predict, like in a reinforcement learning setting. I'm not sure how to best handle this without blowing up the API. Maybe we add a make_predict_func()
method that one can optionally use to formulate such a function so that we can avoid the repeated conditional evaluations?
@alecmdunton I tried summarizing our discussion from this afternoon.
PR #80 takes the first step here by explicitly separating out the nugget computation in the math functions. It will be a little easier now (read: possible) to add heteroscedastic noise behavior. We still need to refactor model optimization so that we don't iteratively copy large heteroscedastic noise vectors when we optimize.
This last method uses less memory, but is also cumbersome. It would be nice for
np.diagonal
to return a writable view, but that promise does not seem to be a priority at the moment.
I think that it will be best to use (batch_count, nn_count, nn_count)
heteroscedastic noise parameters for now. The (batch_count, nn_count)
solution requires either that np.diagonal
returns a writable view (not happening in the near future), or that we copy K
, which will cost more time. The simpler method will also be easier to read. We can update this solution if we find a better way to implement it.
In order to make this work I think I am going to have to feed nn_indices
information into the muygps
class as well as the posterior_mean
and posterior_variance
classes. I have gotten the code to work during training so far, but there is an issue during prediction. It seems that we need to be able to feed the muygps
object an array containing the measurement noise for every training point, and then have access to nn_indices
within the mean and variance functions in order to build the correct kernel matrix depending on whether we are training or predicting.
I am going to have to feed
nn_indices
information into themuygps
class as well as theposterior_mean
andposterior_variance
classes.
That will not work, because all of the index stuff has to happen prior to tensor creation, which happens prior to the creation and use of the MuyGPS
object. This is because, as you may recall, the indices are meaningless in distributed memory. The MuyGPS
class doesn't, and can't, reason about global indices. Instead, we have to create a heteroscedastic noise tensor using index information and feed that to the MuyGPS
functions. This tensor eps
will have shape (batch_count, nn_count)
, such that eps[i, j]
holds the noise prior for batch element i
's j
th nearest neighbor.
We will need a MuyGPyS.gp.tensors
function that creates eps
from the (train_count,)
noise vector and batch_nn_indices
. We will also need a _heteroscedastic_perturb
math function that adds the heteroscedastic noise to the kernel tensor appropriately. We'll also need logic in PosteriorMean
and PosteriorVariance
to apply eps
to their functions, similar to how we're applying homoscedastic noise now. The heteroscedastic noise should be wrapped up in a HeteroscedasticNoise
class with a similar interface to HomoscedasticNoise
, except that it wraps a matrix rather than a scalar.
That makes so much more sense. I have most of the other pieces you mentioned implemented.
We will also of course need test cases to verify that everything works, and that the different implementations agree
Thanks for pointing me in that direction - I was banging my head against a wall for a while.
This is related to issue #105. We currently deepcopy
the MuyGPS
object during optimization functions to avoid side effects. That's fine right now, but once we have a heteroscedastic noise model that becomes expensive. We're instead need a method that shallowly copies references to fixed parameters, while deepcopying free parameters. That should be straightforward but let me know if you have any trouble.
I am currently able to train a heteroscedastic MuyGPS
model on my feature branch. The issue I am dealing with now is prediction. In order to provide the correct nugget tensor for both training and prediction, we need to have the measurement noise for both batch_nn_indices
and test_nn_indices
in the workflow. I am currently providing a (batch_count, nn_count)
shaped tensor as a hyperparameter in the MuyGPs model. Should I edit the signatures of function like regress_any
and regress_from_indices
so that they require measurement noise for each point in test_nn_indices
and build the necessary heteroscedastic tensor within those functions?
You should not need measurement noise for anything in test
. There is a measurement noise prior associated with each train
element, which is used to perturb the train covariance. For a conventional GP, this looks like
$$\mu(Z \mid X) = K(Z, X) (K(X, X) + \epsilon(X) I)^{-1} * Y,$$
where $\epsilon(X)$ is the vector of noise measurements associated with training data $X$. There is no $\epsilon(Z)$.
You will need separate eps
tensors for training versus prediction. For training, you need a (batch_count, nn_count)
tensor, whereas for testing you need a (test_count, nn_count)
tensor. Whatever function you use to generate the first you should also be able to generate the second. See, e.g., how _pairwise_tensor
is used both for building train and predict tensors.
Do we want to feed a (train_count)
length array containing measurement noise for every training point as a hyperparameter to the MuyGPs
class so that both of those tensors can be constructed using a MuyGPs object?
Do we want to feed a
(train_count)
length array as a hyperparameter to theMuyGPs
class so that both of those tensors can be constructed using a MuyGPs object?
No. I believe that it makes a lot more sense to move anything involving the application of indices outside of the MuyGPS
class so that all of its functions have tensors -> tensor signatures. This is related to my instructions to remove references to indices from MuyGPS.build_fast_posterior_mean_coeffs
in iss #106.
Should I edit the signatures of function like
regress_any
andregress_from_indices
so that they require measurement noise for each point intest_nn_indices
...
I am opposed to continuing to maintain these examples due to the overhead involved. The more complexity we add, the more unreadable they become (and the harder the example functions become to maintain!). It would be better to create an example notebook that walks through the workflow.
We would still want functions e.g., optimize_from_tensors
to have a measurement noise tensor eps_nn_targets
corresponding to batch_nn_targets
as an argument, correct?
It would be better to create an example notebook that walks through the workflow.
Also tests that cover all of the relevant behavior!
We would still want functions e.g.,
optimize_from_tensors
to have a measurement noise tensoreps_nn_targets
corresponding tobatch_nn_targets
as an argument, correct?
No. I do not consider anything in MuyGPyS.examples
to be a "part of the library", so we should not keep updating the existing functions. In fact, at some point we need to go through and massively simplify what is in there. It is my stance that going forward that everything in MuyGPyS.examples
is meant to be read as tutorials, but never run in anger.
Providing one-liner interfaces was a trap that we've wasted way too much effort trying to maintain. There is a reason that sklearn, pytorch, tensorflow, etc don't do that.
everything in
MuyGPyS.examples
is meant to be read as tutorials, but never run in anger.
We should also remove most of the references to MuyGPyS.examples
from the test chassis at some future point.
I've condensed some of my thoughts and plans relating to the future of MuyGPyS.examples
in issue #108.
So it's pretty clear to me how to construct a heteroscedastic muygps
object and train it: we provide the object with a (batch_count, nn_count)
tensor of measurement noises for the nearest neighbors of each batch point. I have this up and running. When we have trained the model and need to predict, we need some sort of function that will take as input the (test_count, nn_count)
tensor of measurement noises for the nearest neighbors of each test point.
A lot of my confusion was coming from the fact that I am debugging my code by using MuyGPyS.examples
, so I can stop doing that right now.
My earlier question about optimize_from_tensors
refers to the MuyGPyS.optimize.chassis
module. I do think we need to (1) edit the signature of this function or (2) when we build a muygps
object we set muygps.eps
to a (batch_count, nn_count)
tensor prior to training and then set muygps.eps
to a (test_count, nn_count)
tensor prior to prediction.
The (batch_count, nn_count)
noise tensor should be wrapped in a HeteroscedasticNoise
object that is a member of the muygps
object (muygps.eps
), so we needn't modify the signature of optimize_from_tensors
. Furthermore, we might not even know the test data at the time of optimization, so we cannot include anything related to a (test_count, nn_count)
noise tensor.
You do raise a good point. When using HomoscedasticNoise
, the MuyGPS
object returned from optimize_from_tensors
is the same object that you use to predict. Around the time that we create the test_crosswise_diffs
and test_pairwise_diffs
tensors, we also need to create the new noise tensor and use it to modify the optimized MuyGPS
object.
I'm torn as to how we do this. Currently, all functions are pure functions with no side effects, so when we optimize a MuyGPS
we actually make a new object and return that, leaving the old one unmodified. However, this is a slightly different use case where we want to swap the noise tensor used for training for the one to be used with prediction. We could write a member function that does this, so it would be as simple as
muygps = optimize_from_tensors(...)
new_noise = make_noise_tensor(test, train, nbrs_lookup) # notional names
muygps.eps.set(new_noise)
However, we don't want to user to actually do this anywhere else, and it could cause unforseen issues. Ergo, I think it would be better to write a function change_noise
or similar that is used like
muygps = optimize_from_tensors(...)
new_noise = make_noise_tensor(test, train, nbrs_lookup) # notional names
muygps = apply_new_noise(muygps, new_noise)
Internally, apply_new_noise
makes a new MuyGPS
object, applies new_noise
, and returns that. This has the same problem as optimize_from_tensors
and muygpys_analytic_sigma_sq_optim
, where the only way to do this at the moment is to deepcopy
the passed MuyGPS
object at what is (now, with heteroscedastic noise) large cost. So we can do this for now, but we will really want a solution to iss #105 to avoid all of these unnecessary tensor copies.
We could also write something like:
def modify_optim(optim_fn, new_noise):
def new_optim_fn(*args, **kwargs):
muygps = optim_fn(*args, **kwargs)
muygps.eps.set(new_noise)
return muygps
return new_optim_fn
This avoids a copy and makes things easier on the user, but requires that we know the test
information at optimization time, which we cannot assume. So I think that we have to do something along the lines of the above.
Prototype is working. Once I place the make_noise_tensor
and apply_new_noise
functions appropriately within the library we will have an untested heteroscedasticity feature.
Heteroscedasticity working. Going to clean up code and write tests.
In terms of test coverage, do you think a HeteroscedasticNoise(GPTestCase)
class in tests/gp.py
and backend tests intorch_correctness.py
, jax_correctness.py
, and mpi_correctness.py
is sufficient? These are the files where I found references to the HomoscedasticNoise
class.
that is probably fine. We want something that uses a heteroscedastic noise model during optimization, but tests/optimize.py
is a little busted right now.
We need to support vector-valued
eps
parameters. Similar toMuyGPyS.gp.kernel.SigmaSq
, we should breakeps
out ofMuyGPyS.gp.kernel.Hyperparameter
and make it its own class. We probably only want to support learning the epsilon parameter in the homoscedastic case, though, so we will need to maintain some guardrails to make sure that it is treated as a scalar where appropriate and as a vector where appropriate.