Hyperparameter optimization for kriging model

chenr86 commented 1 year ago

Hello,

I am using surrogates.jl v6.3.0 to build kriging for a dataset with 500 elements. There are 24 input variables and 1 output. But I found the accuracy is terrible. There is serious overfitting with the model.

using CSV 
using DataFrames
using Plots
using Surrogates
using SurrogatesPolyChaos
using StatsPlots
using Statistics
default()
using Dates
using LinearAlgebra
using Lathe.preprocess: TrainTestSplit

df = DataFrame(CSV.File("result.csv"))
train, test = TrainTestSplit(df, .8)
dim = 24

x_train = train[:, 1:dim]
x_train = values.(eachrow(x_train))
y_train = train[:, end]
# y_train = values.(eachrow(y_train))

x_test = test[:, 1:dim]
x_test = values.(eachrow(x_test))
y_test = test[:, end]

lower_bound = [
    200, 200, 200, 200, 200, 200, 
    200, 200, 200, 200, 200, 200,
    180, 180, 180, 180, 180, 180,
    300, 300, 300, 300, 300, 300
]
upper_bound = [
    230, 230, 230, 230, 230, 230,
    275, 275, 275, 275, 275, 275,
    200, 200, 200, 200, 200, 200,
    350, 350, 350, 350, 350, 350
]
p = ones(dim) * 2
theta = [0.5 / max(1e-6 * norm(upper_bound .- lower_bound), std(x_i[i] for x_i in x_train))^p[i] for i in 1:length(x_train[1])]
mymodel = Kriging(x_train, y_train, lower_bound, upper_bound, p=p, theta=theta)

# Prediction
ys_test = mymodel.(x_test)
ys_train = mymodel.(x_train)

# Model assessment criteria
function mae(x, y)
    return sum(abs.(x - y)) / length(x)
end

function mape(x, y)
    return sum(abs.(x - y)/y) / length(x)
end

function rmse(x, y)
    return sqrt(sum(((x - y).^2) / length(x)))
end

function mse(x, y)
    return sum(((x - y).^2) / length(x))
end

function r2(x,y)
    sse = sum((x - y).^2)
    sst = sum((y .- mean(y)).^2)
    return 1 - sse / sst
end

println("      ASSESSMENT CRITERIA            TRAIN              TEST")
println("      Mean Absolute Error       ", mae(ys_train, y_train), "      ", mae(ys_test, y_test))
println("Mean Absolute Percentage Error  ", mape(ys_train, y_train), "      ", mape(ys_test, y_test))
println("    Root Mean Square Error      ", rmse(ys_train, y_train), "      ", rmse(ys_test, y_test))
println("       Mean Square Error        ", mse(ys_train, y_train), "      ", mse(ys_test, y_test))
println("           R Square             ", r2(ys_train, y_train), "      ", r2(ys_test, y_test))

According to https://github.com/SciML/Surrogates.jl/issues/368, I calculate the loss function, Inf is obtained!

function kriging_logpdf(params, x, y, lb, ub)
    d = length(params) ÷ 2
    theta = params[1:d]
    p = params[d+1:end]
    surr = Kriging(x, y, lb, ub; p, theta)

    n = length(y)
    y_minus_1μ = y - ones(length(y), 1) * surr.mu
    Rinv = surr.inverse_of_R

    term1 = only(-(y_minus_1μ' * surr.inverse_of_R * y_minus_1μ) / 2 / surr.sigma)
    term2 = -log((2π * surr.sigma)^(n/2) / sqrt(det(Rinv)))

    logpdf = term1 + term2
    return logpdf
end

loss_func = params -> -kriging_logpdf(params, x_train, y_train, lower_bound, upper_bound)

loss_func([theta; p])

I don't know how can I improve my model, as p and theta are already the values suggested.

vikram-s-narayan commented 1 year ago

As per this paper:

"Since there is no analytical solution for estimating the hyperparameters θ, it is necessary to use numerical optimization to find the hyperparameters θ that maximize the likelihood function. This step is the most challenging in the construction of the kriging model. This is because, as previously mentioned, this estimation involves maximizing the likelihood function, which is often multimodal [Mardia and Watkins, 1989]. Maximizing this function becomes prohibitive for high- dimensional problems (d > 10) due to the cost of computing the determinant of the correlation matrix and the high number of evaluation needed for optimizing a high-dimensional multimodal problem."

archermarx commented 1 year ago

Try p = 1.99 ones(dim) instead of p = 2 ones(dim). This will give better numerical stability (your covariance matrix is singular here, so its determinant is zero and that is why the logpdf is Inf).

jacktang commented 1 year ago

@archermarx why 1.99 here? could you please explain more? Thanks!

archermarx commented 1 year ago

When p = 2.0, the correlation function is nearly flat for points that are nearby, which causes ill conditioning in the correlation matrix. If p is just slightly less than 2.0, then the correlation function has a small slope so even points that are nearby are not counted as being effectively the same. An update I'm working on will change the default from 2.0 to 1.99

chenr86 commented 1 year ago

I have tried p = 1.99 * ones(dim). Still, the model is overfitting and there is nearly no improvement. The predictive coefficient R squared in training set is 1 while in test set is 0.00186.

archermarx commented 1 year ago

Does your loss function still produce Inf?

SciML / Surrogates.jl

Hyperparameter optimization for kriging model #388