LLNL / MuyGPyS

A fast, pure python implementation of the MuyGPs Gaussian process realization and training algorithm.
Other
25 stars 11 forks source link

Using weights for individual data points #245

Open esheldon opened 1 month ago

esheldon commented 1 month ago

I'm finding this code very useful, thank you for writing such a nice package.

I have some data I'd like to fit but each point can have quite variable uncertainty. I find that in this case of different errors the code performs sub optimally.

Is there a way to weight each point? I thought this might be related to the the HeteroscedasticNoise, but my attempts with that failed as it wants a particular shape that I could not figure out.

Thanks in advance.

esheldon commented 1 month ago

I think I figured it out.

import numpy as np
from MuyGPyS.gp import MuyGPS
from MuyGPyS.neighbors import NN_Wrapper
from MuyGPyS.gp.noise import HeteroscedasticNoise

from MuyGPyS.gp.kernels import Matern
from MuyGPyS.gp.hyperparameter import Parameter
from MuyGPyS.gp.deformation import Isotropy, l2

# features and responses for training set
train_features = "some array here"
train_variances = "some array here"
train_responses = "some array here"

# features to predict
features = "some array here"

length_scale = 2000.0
nn_count = 30

kernel = Matern(
    smoothness=Parameter(1.5),
    deformation=Isotropy(
        l2,
        length_scale=Parameter(length_scale),
    ),
)

nnwrap = NN_Wrapper(
    train_features,
    nn_count,
    nn_method="exact",
    algorithm="ball_tree"
)
nn_indices, _ = nnwrap.get_nns(features)

matchvars = train_variances[nn_indices]
noise_model = HeteroscedasticNoise(matchvars)

muygps = MuyGPS(kernel=kernel, noise=noise_model)

res = muygps.make_predict_tensors(
    batch_indices=np.arange(features.shape[0]),
    batch_nn_indices=nn_indices,
    test_features=features,
    train_features=train_features,
    train_targets=train_responses,
)
crosswise_dists, pairwise_dists, nn_responses = res

Kcross = muygps.kernel(crosswise_dists)
Kin = muygps.kernel(pairwise_dists)

predicted = muygps.posterior_mean(
    Kin, Kcross, nn_responses
)
esheldon commented 1 month ago

Maybe something like this could be added as an example discussed in #114

bwpriest commented 1 month ago

Thanks for the comments! The HeteroscedasticNoise class is the right tool for your use case. The current implementation does not play nicely with the MPI backend, so it is still considered an experimental feature. You are right it would be good to have an example for users. We will try to add one when there is time.