AnotherSamWilson / ParBayesianOptimization

Parallelizable Bayesian Optimization in R
107 stars 18 forks source link

Error in bayesOpt, "Error encountered while training GP" #26

Closed ghost closed 3 years ago

ghost commented 3 years ago

Hello. I am interested in finding the best parameter of machine learning by applying parallel bayesian optimization. I have run the code with SVM Radial regression to the built-in dataset, Bostonhousing. However, The iteration often stopped before full iteration(iter.n + iter.k) with a following error announcement.

#!/usr/bin/env Rscript

library(caret)
library(mlbench)
library(foreach)
library(doParallel)
library(ParBayesianOptimization)
library(dplyr, warn.conflicts=F)

cores=10
data(BostonHousing)
BostonHousing = BostonHousing[, -4]
idx = sample(1:nrow(BostonHousing), 400)
train_x = BostonHousing[idx, -13]
train_y = BostonHousing[idx, 13]
test_x = BostonHousing[-idx, -13]
test_y = BostonHousing[-idx, 13]

RMSE = c()
R2 = c()
MAE = c()
idx_ = createFolds(1:nrow(train_x), k=3)
bounds = list(k = c(1, 50))

gamma_ = 1/ncol(train_x)
bounds = list(C = c(1/100, 100), gamma = c(gamma_/100, gamma_*100))
Bayes = function(C, gamma) {
  library(e1071)
  for (i in 1:length(idx_)) {
    train_x_t = train_x[-idx_[[i]], ]
    train_x_v = train_x[idx_[[i]], ]
    train_y_t = train_y[-idx_[[i]]]
    train_y_v = train_y[idx_[[i]]]
    algorithm = svm(train_x_t, train_y_t, 
                    kernel="radial", C=C, gamma=gamma)
    predict = predict(algorithm, train_x_v)
    RMSE = c(RMSE, RMSE(predict, train_y_v))
    R2 = c(R2, R2(predict, train_y_v))
    MAE = c(MAE, MAE(predict, train_y_v))
  }
  return(list(Score = round(-mean(RMSE), 3), 
              R2 = round(mean(R2), 3), 
              MAE = round(mean(MAE), 3)))
}

cluster = makeForkCluster(cores, outfile = "")
registerDoParallel(cluster)
clusterExport(cluster, c("idx", "idx_", "train_x", "train_y", "RMSE", "R2", "MAE"))
clusterEvalQ(cluster, expr = {
  library(caret)
  library(dplyr)
})

BO_search = bayesOpt(Bayes, bounds = bounds,
                     acq = "ei", eps = 0,
                     initPoints = 10, iters.n = 50,
                     iters.k = as.numeric(cores),
                     parallel = T, verbose = 2,
                     plotProgress = T, errorHandling = "continue")
stopCluster(cluster)

Starting Epoch 3 1) Fitting Gaussian Process...

Returning results so far. Error encountered while training GP: <the leading minor of order 25 is not positive definite>

It seems this announcement is not related to other bayesian parameters(cores, acquisition function, etc.), and the stop orders vary in even trials with the same parameter setting. Is this a real error or just an early-stop notification (that more parameter searches are meaningless)? If the latter is right, is it because BostonHousing data is small or because the SVM parameter range is small?

Here is the result of Bayesian Utility per Epoch with 10 cores. BO-Utility  SVM_R

And Here is the result of parameter search range of SVM RBF kernel. SVM Radial Parameter Score

[Additional Question] What is difference between initPoints and gsPoints in bayesOpt function?

AnotherSamWilson commented 3 years ago

I see the problem, somehow the scoring function was sampled at the same spot more than once. This causes the GP to fail. I'll start working on this.

initPoints are the number of iterations to run before any Gaussian Process is trained - the GP needs something to work with before it can be built. gsPoints is a little more complicated. Finding the global optimum of the acquisition function on a GP is not trivial - you need to use Newton's method to find many local optimums, and choose the best one from those. This is assumed to be the global optimum. gsPoints determines how many times we run Newton's method from different starting positions. The more points you same, the more confident you can be that you found the global optimum.

AnotherSamWilson commented 3 years ago

In the meantime, there are some things you can do to try to get around this. There really isn't a reason to run in parallel, the model already runs pretty fast. If the error is caused by the parallel implementation, that might help. To set to sequential, just set iters.k=1 and parallel=FALSE.

ghost commented 3 years ago

This code with BostonHousing is just a simple trial, and next time I will start with my own bigger data. That is why I want to confirm whether the trial code works properly with parallel computation, although it is fast. And I understood that the function bayesOpt utilizes Newton's method to search the global optimization. Thank you!

flippercy commented 3 years ago

This is exactly the same issue I reported before; good to know that we are working toward a solution!

AnotherSamWilson commented 3 years ago

This happens when: 1) iters.k > 1 2) A selected local optimum was at the bound limits 3) The noise added simply returned the same values, since the local optimum was at the bounds already.

Fixed with commit 0c7bef7