AnotherSamWilson / ParBayesianOptimization

Parallelizable Bayesian Optimization in R
107 stars 18 forks source link

Error messages received #12

Closed flippercy closed 4 years ago

flippercy commented 4 years ago

Hi Mr. Wilson:

I've played with your package in the last two days and understood its logic better. Thank you for the excellent work!

I still got a few error messages occasionally. Probably they were due to the data I used because I could not replicate them using the public data sets I found. Could you explain what they mean, please?

The first one is:

**Starting Epoch 2

Fitting Gaussian Process... Could not obtain meaningful lengthscales. Running local optimum search... Convergence Not Found. Trying again with tighter parameters... Convergence Not Found. Trying again with tighter parameters... Convergence Not Found. Trying again with tighter parameters... Maximum convergence attempts exceeded - process is probably sampling random points.** The second one is:

Starting Epoch 2

Fitting Gaussian Process... Returning results so far. Error encountered while training GP: <the leading minor of order 12 (this number varied) is not positive definite> Both showed up when the search was at the early stage and far from complete; the utility was not converged and the best value returned was not even close to the optimal value. The process kept running with the first error message but stopped if the second error message popped up.

Any insights?

Appreciate your help!

Yu

AnotherSamWilson commented 4 years ago

The second message occurs when at least 2 inputs to the GP are near identical, which usually only occurs in two situations: 1) The process has been run long after convergence. 2) All inputs are integers, and the process got stuck in a local optimum.

Can you show the image generated by plotting the object and share the code?

flippercy commented 4 years ago

Hi:

Sorry for the delay. Below are the codes of my scoring function ('calculateWeightedPredictivePower' is a customized function to calculate AUC and KS of predictive models):

hyper_params_xgboost.bounds <- list(ntrees = c(250L, 1200L), learn_rate = c(0.02, 0.12), min_rows = c(10L, 200L), sample_rate = c(0.6, 1), max_depth = c(6L, 20L), col_sample_rate = c(0.5, 1))

hyper.params.xgboost.scoring.function <- function(pList=xgboost.predictors.to.consider ,tVar=targetVariable ,wVar=weightVariable ,DevData=data.dev.h2o ,ValData=data.val.h2o ,monotone=monotonic.list ,randomNumberSeed=randomNumberSeed ,nfolds=0 ,ntrees ,learn_rate ,max_depth ,sample_rate ,min_rows ,col_sample_rate ){

h2o.init()

xgboost.model <- h2o.xgboost(x=pList ,y=tVar ,training_frame = DevData ,nfolds = nfolds ,weights_column = wVar ,max_depth=max_depth ,stopping_metric = 'AUC' ,learn_rate = learn_rate ,sample_rate = sample_rate ,col_sample_rate = col_sample_rate ,min_rows = min_rows ,distribution = 'bernoulli' ,seed = randomNumberSeed ,tree_method = 'hist' ,grow_policy = 'lossguide' ,ntrees = ntrees ,booster = 'gbtree' ,monotone_constraints = monotonic.list ,verbose=F)

xgboost.model.prediction.dev <- as.vector(as.data.frame(h2o.predict(xgboost.model, newdata = DevData))[, 'p1'])

xgboost.model.prediction.val <- as.vector(as.data.frame(h2o.predict(xgboost.model, newdata = ValData))[, 'p1'])

xgboost.dev.power <- calculateWeightedPredictivePower(actual=as.numeric(as.vector(DevData[, tVar])), predicted = xgboost.model.prediction.dev, weights = as.numeric(as.vector(DevData[, wVar])))

xgboost.val.power <- calculateWeightedPredictivePower(actual=as.numeric(as.vector(ValData[, tVar])), predicted = xgboost.model.prediction.val, weights = as.numeric(as.vector(ValData[, wVar])))

xgboost.dev.ks <- xgboost.dev.power$ks xgboost.dev.auc <- xgboost.dev.power$auc

xgboost.val.ks <- xgboost.val.power$ks xgboost.val.auc <- xgboost.val.power$auc

return(list(Score = xgboost.val.auc, val.ks=xgboost.val.ks, dev.auc=xgboost.dev.auc, dev.ks=xgboost.dev.ks)) }

timeWithPar <- system.time( optObj.xgboost <- bayesOpt( FUN = hyper.params.xgboost.scoring.function , bounds = hyper_params_xgboost.bounds , initPoints = 8 , iters.n = 400 , iters.k = 8 , parallel = FALSE , gsPoints = 8 , verbose = 2 , plotProgress = TRUE ) ) I do not have any image returned by erroneous processes; for the codes above, this is the image I got:

image

Appreciate your help!

flippercy commented 4 years ago

Hi:

I saw there are some updates. Have they been implemented yet, please?

Thank you.

AnotherSamWilson commented 4 years ago

Sorry for the late reply, I was going to improve the error handling in this area but it wouldn't fit in well with the rest of the package. The updates that went in improved the error handling for the scoring function itself. Looking at that picture, I would say that the data being fed to the GP is causing singularity issues because it's trying to sample points that are extremely close together. If the process hasn't actually gotten close to the true global optimum, what you can do is increase the 'exploration' parameters for the search.

These are the eps (if using ei, poi acquisition functions) and kappa (if using ucb acquisition function) parameters. Increasing kappa or decreasing eps will tell the process to seek out areas of higher uncertainty, so you are less likely to search the same place many times, causing the singularity issue. The downside is that the process will not sample areas of high potential as often, because it wants to 'explore' areas it is uncertain about.

flippercy commented 4 years ago

Thank you for the reply. Appreciate your help!

Sincerely,

Yu Cao

Sent from my iPhone

On Jul 20, 2020, at 9:18 AM, Samuel Wilson notifications@github.com<mailto:notifications@github.com> wrote:

Sorry for the late reply, I was going to improve the error handling in this area but it wouldn't fit in well with the rest of the package. The updates that went in improved the error handling for the scoring function itself. Looking at that picture, I would say that the data being fed to the GP is causing singularity issues because it's trying to sample points that are extremely close together. If the process hasn't actually gotten close to the true global optimum, what you can do is increase the 'exploration' parameters for the search.

These are the eps (if using ei, poi acquisition functions) and kappa (if using ucb acquisition function) parameters. Increasing kappa or decreasing eps will tell the process to seek out areas of higher uncertainty, so you are less likely to search the same place many times, causing the singularity issue. The downside is that the process will not sample areas of high potential as often, because it wants to 'explore' areas it is uncertain about.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/AnotherSamWilson/ParBayesianOptimization/issues/12#issuecomment-661068591, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMMP5HJRB56JFNT3AYJ3VWLR4RG2ZANCNFSM4MLARRCQ.

flippercy commented 4 years ago

Hi Mr. Wilson:

One more question: what is the relationship between kappa and percentile? Any table or reference?

Appreciate your help!

Yu Cao

Sent from my iPhone

On Jul 20, 2020, at 9:18 AM, Samuel Wilson notifications@github.com<mailto:notifications@github.com> wrote:

Sorry for the late reply, I was going to improve the error handling in this area but it wouldn't fit in well with the rest of the package. The updates that went in improved the error handling for the scoring function itself. Looking at that picture, I would say that the data being fed to the GP is causing singularity issues because it's trying to sample points that are extremely close together. If the process hasn't actually gotten close to the true global optimum, what you can do is increase the 'exploration' parameters for the search.

These are the eps (if using ei, poi acquisition functions) and kappa (if using ucb acquisition function) parameters. Increasing kappa or decreasing eps will tell the process to seek out areas of higher uncertainty, so you are less likely to search the same place many times, causing the singularity issue. The downside is that the process will not sample areas of high potential as often, because it wants to 'explore' areas it is uncertain about.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/AnotherSamWilson/ParBayesianOptimization/issues/12#issuecomment-661068591, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMMP5HJRB56JFNT3AYJ3VWLR4RG2ZANCNFSM4MLARRCQ.

AnotherSamWilson commented 4 years ago

kappa is the 'Z score' that corresponds to the percentile on the normal distribution. image

flippercy commented 4 years ago

Got it. If so then 2.56 is already very high and there is not much room left to make it greater.

Sent from my iPhone

On Jul 22, 2020, at 5:03 PM, Samuel Wilson notifications@github.com<mailto:notifications@github.com> wrote:

kappa is the 'Z score' that corresponds to the percentile on the normal distribution. [image]https://user-images.githubusercontent.com/44655289/88233354-88e8f680-cc45-11ea-8eff-2743568535d8.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/AnotherSamWilson/ParBayesianOptimization/issues/12#issuecomment-662720171, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMMP5HKAPLI4PCY2SXZSHNTR45OZ3ANCNFSM4MLARRCQ.

AnotherSamWilson commented 4 years ago

2.56 is actually a good balance between the two - I would try increasing to 3.1 and see how that works. It sounds like you are trying to optimize a problem which you already know the optimal point to - is there any way you can share all of your code / data? A kappa of 2.56 usually works for most problems. It might not if the optimal point is surrounded by low performing points, for example the following function would be hard for bayesian optimization to find the maximum value of: plt

flippercy commented 4 years ago

Hi Mr. Wilson:

As shown above, I was optimizing hyperparameters for xgboost . Let me review all my codes and see what else I can share or revise.

Last question: How does gsPoints work? How is it different from initPoints?

Thank you very much!