Iterative analytic scale optimization

igoumiri commented 1 year ago

Not sure if that's the best way to do that but I wanted to start the discussion. This is based on Amanda's R code demo but simplified.

bwpriest commented 12 months ago

Are we sure that this behavior should go into AnalyticScale, rather than a new class like IterativeAnalyticScale? If I understand correctly, this iterative behavior is only needed when we think that we know the noise prior variance (e.g. from instrument measurements).

If we keep it in AnalyticScale, we should definitely change the default value of iteration_count to 1 to avoid creating unnecessary work in existing workflows.

bwpriest commented 12 months ago

Some of the multivariate tests crash because the changes result in producing scale values that are very large (~1e5 instead of ~1e0). We should check to see why this is happening. We don't want this iterative method to accidentally amplify scale values.

bwpriest commented 11 months ago

Some of the multivariate tests crash because the changes result in producing scale values that are very large (~1e5 instead of ~1e0). We should check to see why this is happening. We don't want this iterative method to accidentally amplify scale values.

From talking to Amanda, it seems that there is a workaround to be implemented.

igoumiri commented 9 months ago

I finally got it to work exactly like Amanda's reference code in R using the python code below to validate. I set the default number of iterations to 1. I made sure we only iterate when we have a scalar response. I also added a basic test. Good to merge now @bwpriest?

import numpy as np
from scipy.linalg import cholesky
from scipy.stats import norm
from sklearn.gaussian_process import kernels

from MuyGPyS.gp import MuyGPS
from MuyGPyS.gp.deformation import Isotropy, l2
from MuyGPyS.gp.hyperparameter import (
    AnalyticScale,
    Parameter,
)
from MuyGPyS.gp.kernels import Matern
from MuyGPyS.gp.noise import HomoscedasticNoise, HeteroscedasticNoise
from MuyGPyS.gp.tensors import make_train_tensors
from MuyGPyS.neighbors import NN_Wrapper
from MuyGPyS.optimize.batch import sample_batch

if __name__ == "__main__":
    n = 100
    x = np.linspace(0, 1, n)
    tau = np.random.uniform(size=n) / 1000000
    K = 3 * kernels.Matern(length_scale=1, nu=0.5)(x[:, np.newaxis]) + np.diag(tau)
    L = cholesky(K)
    e = norm.rvs(size=n)
    Y = L @ e

    train_features = x[:, np.newaxis]
    train_responses = Y[:, np.newaxis]

    nn_count = 30
    nbrs_lookup = NN_Wrapper(
        train_features, nn_count, nn_method="exact", algorithm="ball_tree"
    )

    train_count = n
    batch_count = train_count
    batch_indices, batch_nn_indices = sample_batch(
        nbrs_lookup, batch_count, train_count
    )

    (
        batch_crosswise_diffs,
        batch_pairwise_diffs,
        batch_targets,
        batch_nn_targets,
    ) = make_train_tensors(
        batch_indices,
        batch_nn_indices,
        train_features,
        train_responses,
    )

    muygps = MuyGPS(
        kernel=Matern(
            smoothness=Parameter(0.5),
            deformation=Isotropy(
                metric=l2,
                length_scale=Parameter(1),
            ),
        ),
        # noise=HomoscedasticNoise(tau.mean()),
        noise=HeteroscedasticNoise(tau[batch_nn_indices]),
        scale=AnalyticScale(iteration_count=20),
    )

    muygps = muygps.optimize_scale(batch_pairwise_diffs, batch_nn_targets)
    print(muygps.scale())

    # Below is a replicate of Amanda's manual R code
    # sigma2vec = np.zeros(21)

    # ## Initial guess for nugget(phi) given inital sigma2=1
    # K0 = kernels.Matern(length_scale=1, nu=0.5)(x[:, np.newaxis]) + np.diag(tau)
    # sigma2 = Y @ np.linalg.solve(K0, Y) / len(Y)
    # phi = tau / sigma2
    # sigma2vec[0] = sigma2

    # ## Second step: given the sigma estimates, re-estimate
    # K0 = sigma2 * (
    #     kernels.Matern(length_scale=1, nu=0.5)(x[:, np.newaxis]) + np.diag(tau)
    # )
    # sigma2 = Y @ np.linalg.solve(K0, Y) / len(Y)
    # phi = tau / sigma2
    # sigma2vec[1] = sigma2

    # ## Third step: average first 2 sigma2 estimates
    # sigma2 = np.mean(sigma2vec[:2])
    # phi = tau / sigma2
    # sigma2vec[2] = sigma2

    # ## Every 2 steps, average the results to get new sigma2
    # for i in range(9):
    #     K0 = sigma2 * (
    #         kernels.Matern(length_scale=1, nu=0.5)(x[:, np.newaxis]) + np.diag(tau)
    #     )
    #     sigma2 = Y @ np.linalg.solve(K0, Y) / len(Y)
    #     phi = tau / sigma2
    #     sigma2vec[3 + 2 * i] = sigma2

    #     sigma2 = np.mean(sigma2vec[2 + 2 * i : 4 + 2 * i])
    #     phi = tau / sigma2
    #     sigma2vec[4 + 2 * i] = sigma2

    # print(sigma2vec)

For reference, here's Amanda's original R code:

## Investigating the nugget
library(fields)
set.seed(5)
n <- 100
x <- seq(0, 1, length.out = n)

nloglike <- function(Y, pars, x) {
  K <- Matern(rdist(x), range = 1, smoothness = .5) + diag(pars, length(x))
  L <- chol(K)
  return(2 * sum(diag(log(L))) + crossprod(c(Y), chol2inv(L)) %*% c(Y))
}

## Fixed nugget vector (smaller is harder)
tau <- runif(n) / 1000000
# tau <- 1e-10
e <- rnorm(n)
## Covariance matrix
K <- 3 * Matern(rdist(x), range = 1, smoothness = .5) + diag(tau, n)
## Cholesky decomposition
L <- chol(K)
## gen data with these parameters
Y <- crossprod(L, e)

## Accelerate to the solution for sigma2
## Testing here for 100 iterations (we will need less than this)
sigma2vec <- rep(NA, 100)

## Initial guess for nugget(phi) given inital sigma2=1
K0 <- Matern(rdist(x), range = 1, smoothness = .5) + diag(tau, n)
sigma2 <- c(c(Y) %*% solve(K0, Y) / length(Y))
phi <- tau / sigma2
## Estimate sigma2
sigma2vec[1] <- sigma2

## Second step: given the sigma estimates, re-estimate
K0 <- sigma2 * (Matern(rdist(x), range = 1, smoothness = .5) + diag(phi, n))
sigma2 <- c(c(Y) %*% solve(K0, Y) / length(Y))
phi <- tau / sigma2
sigma2vec[2] <- sigma2

## Third step: average first 2 sigma2 estimates
sigma2 <- mean(sigma2vec[1:2])
phi <- tau / sigma2
sigma2vec[3] <- sigma2

## Every 2 steps, average the results to get new sigma2
for (i in 1:49) {
  K0 <- sigma2 * (Matern(rdist(x), range = 1, smoothness = .5) + diag(phi, n))
  sigma2 <- c(c(Y) %*% solve(K0, Y) / length(Y))
  phi <- tau / sigma2
  sigma2vec[4 + (i - 1) * 2] <- sigma2

  sigma2 <- mean(sigma2vec[c(3 + (i - 1) * 2):c(4 + (i - 1) * 2)])
  phi <- tau / sigma2
  sigma2vec[5 + (i - 1) * 2] <- sigma2
}
## See trace of convergence of sigma2
plot(sigma2vec)
print(sigma2vec[n])

## After 10 iterations, we have the solution of what sigma2 and the nugget really should be
## All of this needs to be performed in the sigma2 for each parameter set in optimization

LLNL / MuyGPyS

Iterative analytic scale optimization #207