Error with initial scoring

papaemman commented 4 years ago

I receive the following error and I don't have any clue how to fix it. Any suggestions?

The scoringFunction() takes a while to complete (around 20 mins) and returns a list with 3 more arguments along with Score. I don't know if this is relevant

> optObj <- bayesOpt(
+   FUN = scoringFunction,
+   bounds = bounds,
+   saveFile = NULL,
+   # initGrid,
+   initPoints = 12,
+   iters.n = 8,
+   iters.k = 1,
+   otherHalting = list(timeLimit = Inf, minUtility = 0),
+   acq = "ucb",  # "ucb", "ei", "eips", "poi"
+   kappa = 2.576,
+   eps = 0,
+   parallel = FALSE,
+   gsPoints = pmax(100, length(bounds)^3),
+   convThresh = 1e+08,
+   acqThresh = 1,
+   errorHandling = "stop",
+   plotProgress = T,
+   verbose = 2
+ )

Running initial scoring function 12 times in 1 thread(s)...Timing stopped at: 3731 28.91 479.4
Error in { : 
  Item 7 has 2 columns, inconsistent with item 1 which has 16 columns. To fill missing columns use fill=TRUE.
Timing stopped at: 7.27e+04 567.1 9453

samFarrellDay commented 4 years ago

This usually means that the list being returned has different elements. Things get a liiiittle funky when bayesOpt then tries to combine them. What's probably happening is your scoring function is not returning the desired elements, but isn't necessarily failing. Can you send the full code? Also, can you confirm you are using version 1.1.0

papaemman commented 4 years ago

Thanks for the answer.

Yes, I'm using the 1.1.0 version of the package.

Here is my scoringFunction. (As you can see I use a custom evaluation metric function for the training, but I have test it and returns the desired output for every iteration. I don't think that this is the problem.

Also, I have run the same process for fewer iterations (10 rounds) just for a sanity check and it completely without the error.

scoringFunction <- function(learning_rate, num_leaves,                                                   # Core parameters
                            max_depth, min_data_in_leaf,                                                 # Learning Control Parameters
                            bagging_fraction, bagging_freq, feature_fraction, feature_fraction_bynode,
                            lambda_l1, lambda_l2,
                            tweedie_variance_power) {                                                    # Objective parameters 

  # Define training parameters

  params <- list(

    task = "train",                  

    objective = "tweedie",           # "regression", "regression_l1", "poisson", "huber", "tweedie"
    boosting = "gbdt",               # Boosting type: "gbdt", "rf", "dart", "goss"

    learning_rate = learning_rate,   # shrinkage rate                       | aliases: shrinkage_rate, eta
    num_leaves = num_leaves,         # max number of leaves in one tree     | aliases: num_leaf, max_leaves, max_leaf
    tree_learner = "serial",         # "serial", "feature", "data", "voting"

    seed = 33,
    nthread = 8,
    device_type = "cpu",
   force_col_wise = TRUE,
    force_row_wise = FALSE,

    max_depth = max_depth,
    min_data_in_leaf = min_data_in_leaf,                # minimal number of data in one leaf   | aliases: min_data_per_leaf, min_data, min_child_samples

    bagging_fraction = bagging_fraction,                # randomly select part of data without resampling  | aliases: sub_row, subsample, bagging
    bagging_freq = bagging_freq,                        # frequency for bagging                            | subsample_freq 
    bagging_seed = 33,

    feature_fraction = feature_fraction,                 # for boosting "rf" |  aliases: sub_feature, colsample_bytree
    feature_fraction_bynode = feature_fraction_bynode,   # 
    feature_fraction_seed = 33,

    lambda_l1 = lambda_l1,                               #  | aliases: reg_alpha
    lambda_l2 = lambda_l2,                               #  | aliases: reg_lambda, lambda

    # min_gain_to_split = 0.01,

    extra_trees = FALSE,
    extra_seed = 33, 

    # 3. I/O parameters ----
    # 4. Objective parameters ----

    tweedie_variance_power= tweedie_variance_power,
    boost_from_average = F

  )

  ## Lightgbm Training

  lgb_model <- lgb.train(params = params, data = train_data,
                         valids = list(valid = valid_data), eval_freq = 30, early_stopping_rounds = 300, # Validation parameters
                         eval = custom_wrmsse_metric,                           # SOS:  wrmsse_den, weights, fh datafiles needed for wrmsse calculations
                         metric = "rmse", 
                         nrounds = 1000,  
                         categorical_feature = categoricals,
                         verbose = -1, record = TRUE, init_model = NULL, colnames = NULL,
                         callbacks = list(), reset_data = FALSE)

  # NOTE:
  # ParBayesianOptimization will maximize the function you provide it, so to change your problem to a maximization one, just multiply the Score by -1.

  # Get results (Score = wrmsse and rmse)
  ls <- list(Score = - min(unlist(lgb_model$record_evals$valid$wrmsse$eval)),
             nrounds_wrmsse = which.min(unlist(lgb_model$record_evals$valid$wrmsse$eval)),
             rmse = min(unlist(lgb_model$record_evals$valid$rmse$eval)),
             nrounds_rmse = which.min(unlist(lgb_model$record_evals$valid$rmse$eval)))

  return(ls)
}

AnotherSamWilson commented 4 years ago

I've just updated the error handling to return the specific errors encountered in initialization with version 1.2.0. Can you install from this github and let me know the results when you initialize with 12 rounds and let me know the error returned?

AnotherSamWilson / ParBayesianOptimization

Error with initial scoring #13