Closed kapsner closed 2 years ago
Hmmm #43 was also a xgboost reproducibility problem. The easiest way to tell if this is an issue with this package or xgboost is to look at the first iteration where the results diverge in opt_obj$scoreSummary. Did xgboost return the same score for the same parameters, or did ParBayesianOptimization suggest different parameters, even though all of the prior results were the same?
Thanks @AnotherSamWilson for the quick reply. Here is a short example of the rows, where the results start to diverge from "setting_id" 67 on between both system. The score of round 66 (the last round that is equal) is identical on both system:
(acqThresh
was set to 0.7... indeed, when setting this to 1, the best parameter setting found is equal between bot systems, however the results for gpUtility
diverge and on one system convergence issues occur causing the optimization to stop earlier than on the other system).
System 1:
xgboost_tuning_params$results_cv[65:71, ]
#> Epoch setting_id subsample colsample_bytree min_child_weight learning_rate max_depth gpUtility acqOptimum inBounds Elapsed Score
#> 1: 2 65 0.5908691 0.4104639 7.8111536 0.04492852 7 0.4013550 TRUE TRUE 3.402 -2.611727
#> 2: 2 66 0.5146076 0.6523607 5.7668434 0.10002708 23 0.4013550 TRUE TRUE 3.154 -2.687815
#> 3: 3 67 0.6705731 0.8091133 6.4734957 0.15873880 29 0.4789347 TRUE TRUE 2.856 -2.622872
#> 4: 3 68 0.5869667 0.8454276 9.2729579 0.12477875 15 0.4743232 TRUE TRUE 3.626 -2.659254
#> 5: 3 69 0.6141893 0.3000000 5.4570650 0.05627890 7 0.4666818 TRUE TRUE 3.063 -2.653813
#> 6: 3 70 0.7622537 0.9624663 4.3227332 0.10154535 7 0.4660354 TRUE TRUE 3.729 -2.619426
#> 7: 3 71 0.6650588 1.0000000 0.5534595 0.07013747 6 0.4642572 TRUE TRUE 5.324 -2.751824
#> mean_cv_metric mean_cv_nrounds errorMessage objective eval_metric verbosity
#> 1: 2.611727 462 NA survival:cox cox-nloglik 0
#> 2: 2.687815 204 NA survival:cox cox-nloglik 0
#> 3: 2.622872 48 NA survival:cox cox-nloglik 0
#> 4: 2.659254 523 NA survival:cox cox-nloglik 0
#> 5: 2.653813 322 NA survival:cox cox-nloglik 0
#> 6: 2.619426 32 NA survival:cox cox-nloglik 0
#> 7: 2.751824 25 NA survival:cox cox-nloglik 0
System 2:
xgboost_tuning_params$results_cv[65:71, ]
#> Epoch setting_id subsample colsample_bytree min_child_weight learning_rate max_depth gpUtility acqOptimum inBounds
#> 1: 2 65 0.5908691 0.4104639 7.8111536 0.04492852 7 0.4013550 TRUE TRUE
#> 2: 2 66 0.5146076 0.6523607 5.7668434 0.10002708 23 0.4013550 TRUE TRUE
#> 3: 3 67 0.6705618 0.8091689 6.4734956 0.15873388 29 0.4789348 TRUE TRUE
#> 4: 3 68 0.5869677 0.8451316 9.2729640 0.12476391 15 0.4743233 TRUE TRUE
#> 5: 3 69 0.6141895 0.3000000 5.4570638 0.05627900 7 0.4666818 TRUE TRUE
#> 6: 3 70 0.7622691 0.9624764 4.3227354 0.10156007 7 0.4660354 TRUE TRUE
#> 7: 3 71 0.6647779 1.0000000 0.5537908 0.07018688 6 0.4642573 TRUE TRUE
#> Elapsed Score mean_cv_metric mean_cv_nrounds errorMessage objective eval_metric verbosity
#> 1: 3.564 -2.611727 2.611727 462 NA survival:cox cox-nloglik 0
#> 2: 3.357 -2.687815 2.687815 204 NA survival:cox cox-nloglik 0
#> 3: 3.135 -2.622881 2.622881 48 NA survival:cox cox-nloglik 0
#> 4: 3.752 -2.659247 2.659247 523 NA survival:cox cox-nloglik 0
#> 5: 3.399 -2.653813 2.653813 322 NA survival:cox cox-nloglik 0
#> 6: 3.801 -2.619402 2.619402 32 NA survival:cox cox-nloglik 0
#> 7: 5.132 -2.737230 2.737230 30 NA survival:cox cox-nloglik 0
Hmm it looks like the package recommended different parameters for the same scores. Can you do me a favor and send me an RDS of both of these results dataframes. I should be able to reproduce the problem with just the summary table. Can you also send me the output of sessionInfo()
.
The wrapper-function with common code to carry out the hyperparameter optimization for different algorithms is:
bayesian_opt <- function(
method = c("glmnet", "ranger", "xgboost"),
x,
y,
init_grid,
arg_list,
ncores,
cv_only,
rnd_seed,
bayes_bounds,
bayes_niter,
bayes_otherHalting
) {
method <- match.arg(method)
stopifnot(is.list(arg_list))
if (method == "xgboost") {
cluster_export <- c(
"xgboost_cv", "xgboost_fit", "setup_xgb_dataset",
"predict_xgboost", "xgb_c_index", "test_fold_ids"
)
#cluster_load <- c("xgboost", "glmnet")
} else if (method == "glmnet") {
cluster_export <- c(
"glmnet_cv", "glmnet_fit", "register_parallel"
)
#cluster_load <- c("parallel", "glmnet")
} else if (method == "ranger") {
cluster_export <- c(
"ranger_cv", "ranger_fit", "set_ranger_params",
"predict_ranger", "matrix_to_df"
)
#cluster_load <- c("ranger", "glmnet")
}
go_parallel <- ifelse(
test = ncores > 1,
yes = TRUE,
no = FALSE
)
# for bayesOpt:
# handle default behaviour
if (is.null(bayes_otherHalting)) {
bayes_otherHalting <- list("timeLimit" = Inf, "minUtility" = 0)
}
iters_k <- ncores
cl <- register_parallel(ncores)
on.exit(
expr = {
parallel::stopCluster(cl)
foreach::registerDoSEQ()
# reset random number generator
RNGkind(kind = "default")
invisible(gc())
}
)
# export from current env
parallel::clusterExport(
cl = cl,
varlist = c(
"x", "y", "cv_only", "rnd_seed", "arg_list", "ncores" #, "cluster_load"
),
envir = environment()
)
# export from global env
parallel::clusterExport(
cl = cl,
varlist = cluster_export,
envir = -1
)
parallel::clusterSetRNGStream(
cl = cl,
iseed = rnd_seed
)
parallel::clusterEvalQ(
cl = cl,
expr = {
#lapply(cluster_load, library, character.only = TRUE) # not neccessary since using ::-notation everywhere
RNGkind("L'Ecuyer-CMRG")
set.seed(rnd_seed) #, kind = "L'Ecuyer-CMRG") # set seed in each job for reproducibility
}
)
set.seed(rnd_seed)
opt_obj <- ParBayesianOptimization::bayesOpt(
FUN = eval(parse(text = paste0(method, "_bayesian_scoringFunction"))),
bounds = bayes_bounds,
initGrid = init_grid,
iters.n = bayes_niter,
iters.k = iters_k,
parallel = go_parallel,
otherHalting = bayes_otherHalting
, acq = "ucb"
, kappa = 3.5
#, convThresh = 1e5
, acqThresh = 0.7
#, gsPoints = pmax(256, length(bayes_bounds)^3)
)
return(opt_obj)
}
Outputs of sessionInfo()
are:
System 1:
sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
#> [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.14.2 magrittr_2.0.3
#>
#> loaded via a namespace (and not attached):
#> [1] utf8_1.2.2 R.utils_2.11.0 tidyselect_1.1.2 lme4_1.1-29
#> [5] htmlwidgets_1.5.4 grid_4.2.0 munsell_0.5.0 codetools_0.2-18
#> [9] effectsize_0.7.0 xgboost_1.6.0.1 withr_2.5.0 colorspace_2.0-3
#> [13] splitTools_0.3.2 shapviz_0.1.0 highr_0.9 knitr_1.39
#> [17] uuid_1.1-0 rstudioapi_0.13 ggsignif_0.6.3 officer_0.4.3
#> [21] labeling_0.4.2 emmeans_1.7.4-1 KMsurv_0.1-5 farver_2.1.0
#> [25] datawizard_0.4.1 rprojroot_2.0.3 coda_0.19-4 vctrs_0.4.1
#> [29] generics_0.1.2 TH.data_1.1-1 xfun_0.31 R6_2.5.1
#> [33] markdown_1.1 doParallel_1.0.17 ggbeeswarm_0.6.0 DescrTab2_2.1.9
#> [37] lhs_1.1.5 assertthat_0.2.1 scales_1.2.0 multcomp_1.4-19
#> [41] nnet_7.3-17 beeswarm_0.4.0 gtable_0.3.0 fastshap_0.0.7
#> [45] DiceKriging_1.6.0 sandwich_3.0-2 rlang_1.0.2 zeallot_0.1.0
#> [49] systemfonts_1.0.4 sjPlot_2.8.10 splines_4.2.0 rstatix_0.7.0
#> [53] selectr_0.4-2 broom_0.8.0 prismatic_1.1.0 checkmate_2.1.0
#> [57] yaml_2.3.5 reshape2_1.4.4 abind_1.4-5 modelr_0.1.8
#> [61] backports_1.4.1 Hmisc_4.7-0 gridtext_0.1.4 tools_4.2.0
#> [65] bookdown_0.27 ggplot2_3.3.6 ellipsis_0.3.2 kableExtra_1.3.4
#> [69] RColorBrewer_1.1-3 Rcpp_1.0.8.3 plyr_1.8.7 base64enc_0.1-3
#> [73] progress_1.2.2 purrr_0.3.4 prettyunits_1.1.1 ggpubr_0.4.0
#> [77] rpart_4.1.16 dbscan_1.1-10 viridis_0.6.2 cowplot_1.1.1
#> [81] correlation_0.8.1 zoo_1.8-10 haven_2.5.0 cluster_2.1.3
#> [85] fs_1.5.2 here_1.0.1 tinytex_0.40 flextable_0.7.2
#> [89] reprex_2.0.1 survminer_0.4.9 mvtnorm_1.1-3 R.cache_0.15.0
#> [93] sjmisc_2.8.9 hms_1.1.1 patchwork_1.1.1 evaluate_0.15
#> [97] xtable_1.8-4 sjstats_0.18.1 jpeg_0.1-9 gridExtra_2.3
#> [101] ggeffects_1.1.2 shape_1.4.6 compiler_4.2.0 tibble_3.1.7
#> [105] ggstatsplot_0.9.3 crayon_1.5.1 R.oo_1.25.0 minqa_1.2.4
#> [109] htmltools_0.5.2 Formula_1.2-4 tidyr_1.2.0 ggtext_0.1.1
#> [113] DBI_1.1.3 sjlabelled_1.2.0 MASS_7.3-57 boot_1.3-28
#> [117] Matrix_1.4-1 car_3.1-0 sjtable2df_0.0.1.9004 wesanderson_0.3.6
#> [121] cli_3.3.0 ParBayesianOptimization_1.2.4 R.methodsS3_1.8.2 parallel_4.2.0
#> [125] insight_0.17.1 forcats_0.5.1 pkgconfig_2.0.3 km.ci_0.5-6
#> [129] statsExpressions_1.3.2 foreign_0.8-82 xml2_1.3.3 paletteer_1.4.0
#> [133] foreach_1.5.2 svglite_2.1.0 vipor_0.4.5 ggcorrplot_0.1.3
#> [137] webshot_0.5.3 estimability_1.3 rvest_1.0.2 stringr_1.4.0
#> [141] digest_0.6.29 parameters_0.18.1 rmarkdown_2.14 survMisc_0.5.6
#> [145] htmlTable_2.4.0 gdtools_0.2.4 nloptr_2.0.3 lifecycle_1.0.1
#> [149] nlme_3.1-157 jsonlite_1.8.0 carData_3.0-5 viridisLite_0.4.0
#> [153] fansi_1.0.3 pillar_1.7.0 lattice_0.20-45 fastmap_1.1.0
#> [157] httr_1.4.3 survival_3.3-1 glue_1.6.2 bayestestR_0.12.1
#> [161] zip_2.2.0 png_0.1-7 iterators_1.0.14 glmnet_4.1-4
#> [165] stringi_1.7.6 performance_0.9.0 rematch2_2.1.2 latticeExtra_0.6-29
#> [169] styler_1.7.0 dplyr_1.0.9
System 2:
sessionInfo()
#> R version 4.2.1 (2022-06-23)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.14.2 magrittr_2.0.3
#>
#> loaded via a namespace (and not attached):
#> [1] utf8_1.2.2 R.utils_2.12.0 tidyselect_1.1.2
#> [4] lme4_1.1-30 htmlwidgets_1.5.4 grid_4.2.1
#> [7] ranger_0.14.1 munsell_0.5.0 codetools_0.2-18
#> [10] effectsize_0.7.0 interp_1.1-2 xgboost_1.6.0.1
#> [13] withr_2.5.0 colorspace_2.0-3 splitTools_0.3.2
#> [16] shapviz_0.2.0 highr_0.9 knitr_1.39
#> [19] uuid_1.1-0 rstudioapi_0.13 ggsignif_0.6.3
#> [22] officer_0.4.3 labeling_0.4.2 emmeans_1.7.5
#> [25] KMsurv_0.1-5 farver_2.1.1 datawizard_0.4.1
#> [28] rprojroot_2.0.3 coda_0.19-4 vctrs_0.4.1
#> [31] generics_0.1.3 TH.data_1.1-1 xfun_0.31
#> [34] R6_2.5.1 markdown_1.1 doParallel_1.0.17
#> [37] ggbeeswarm_0.6.0 DescrTab2_2.1.9 lhs_1.1.5
#> [40] assertthat_0.2.1 scales_1.2.0 multcomp_1.4-19
#> [43] nnet_7.3-17 beeswarm_0.4.0 gtable_0.3.0
#> [46] fastshap_0.0.7 DiceKriging_1.6.0 sandwich_3.0-2
#> [49] rlang_1.0.4 zeallot_0.1.0 gggenes_0.4.1
#> [52] systemfonts_1.0.4 sjPlot_2.8.10 splines_4.2.1
#> [55] rstatix_0.7.0 selectr_0.4-2 broom_1.0.0
#> [58] prismatic_1.1.0 checkmate_2.1.0 yaml_2.3.5
#> [61] reshape2_1.4.4 abind_1.4-5 modelr_0.1.8
#> [64] backports_1.4.1 Hmisc_4.7-0 gridtext_0.1.4
#> [67] tools_4.2.1 bookdown_0.27 ggplot2_3.3.6
#> [70] ellipsis_0.3.2 kableExtra_1.3.4 RColorBrewer_1.1-3
#> [73] Rcpp_1.0.9 plyr_1.8.7 base64enc_0.1-3
#> [76] progress_1.2.2 purrr_0.3.4 prettyunits_1.1.1
#> [79] ggpubr_0.4.0 rpart_4.1.16 dbscan_1.1-10
#> [82] deldir_1.0-6 viridis_0.6.2 cowplot_1.1.1
#> [85] correlation_0.8.1 zoo_1.8-10 ggrepel_0.9.1
#> [88] haven_2.5.0 cluster_2.1.3 fs_1.5.2
#> [91] here_1.0.1 tinytex_0.40 flextable_0.7.2
#> [94] reprex_2.0.1 survminer_0.4.9 mvtnorm_1.1-3
#> [97] R.cache_0.15.0 sjmisc_2.8.9 hms_1.1.1
#> [100] patchwork_1.1.1 evaluate_0.15 xtable_1.8-4
#> [103] sjstats_0.18.1 jpeg_0.1-9 gridExtra_2.3
#> [106] ggeffects_1.1.2 shape_1.4.6 compiler_4.2.1
#> [109] tibble_3.1.7 ggstatsplot_0.9.3 crayon_1.5.1
#> [112] R.oo_1.25.0 minqa_1.2.4 htmltools_0.5.2
#> [115] Formula_1.2-4 tidyr_1.2.0 ggtext_0.1.1
#> [118] DBI_1.1.3 sjlabelled_1.2.0 MASS_7.3-57
#> [121] boot_1.3-28 Matrix_1.4-1 car_3.1-0
#> [124] sjtable2df_0.0.2.9001 wesanderson_0.3.6 cli_3.3.0
#> [127] ParBayesianOptimization_1.2.4 R.methodsS3_1.8.2 parallel_4.2.1
#> [130] insight_0.18.0 forcats_0.5.1 pkgconfig_2.0.3
#> [133] km.ci_0.5-6 statsExpressions_1.3.2 foreign_0.8-82
#> [136] xml2_1.3.3 paletteer_1.4.0 foreach_1.5.2
#> [139] svglite_2.1.0 vipor_0.4.5 ggcorrplot_0.1.3
#> [142] webshot_0.5.3 estimability_1.4 rvest_1.0.2
#> [145] stringr_1.4.0 digest_0.6.29 parameters_0.18.1
#> [148] rmarkdown_2.14 survMisc_0.5.6 htmlTable_2.4.1
#> [151] gdtools_0.2.4 nloptr_2.0.3 lifecycle_1.0.1
#> [154] nlme_3.1-157 jsonlite_1.8.0 carData_3.0-5
#> [157] viridisLite_0.4.0 fansi_1.0.3 pillar_1.7.0
#> [160] shades_1.4.0 lattice_0.20-45 fastmap_1.1.0
#> [163] httr_1.4.3 survival_3.3-1 glue_1.6.2
#> [166] bayestestR_0.12.1 zip_2.2.0 png_0.1-7
#> [169] iterators_1.0.14 glmnet_4.1-4 stringi_1.7.8
#> [172] performance_0.9.1 ggfittext_0.9.1 rematch2_2.1.2
#> [175] latticeExtra_0.6-30 styler_1.7.0 dplyr_1.0.9
When diving into the code, the only thing I found was this: https://github.com/AnotherSamWilson/ParBayesianOptimization/blob/master/R/getNextParameters.R#L19: setorder(LocalOptims,-"gpUtility")
Could it maybe happen that the ordering of the data.table produces different results when ties occur? This should be more probable with lower values for acqThresh
...
In fact, in my experiments, many of xgboost's parameter combinations produce very good results in terms of the evaluation metric so it shouldn't be surprising that there are muliple local optima nearby, some of which could have equal values of gpUtility
and then ordering would not be uniquely possible anymore.
That's a good point. I also notice the R version of the two systems above is different. Do you have access to these two systems, and can you force an equal environment?
Yes, I have access to the systems (both running in docker containers) and updated the R-version on system 1 - however, the issue persists even with equal R versions. I then also re-ran the code with the latest version of ParBayesianOptimiziation
as I just realized that this is newer than the CRAN-version I was using - also running into the same issue.
I also just realized that in the results pasted above, the gpUtility
of setting ids 65 and 66 is indeed the same. So indeed, I think it would be necessary to have some other criteria as well for sorting, e.g. "Score" and "inBounds".
Using RStudio's debug-mode, I found that the table LocalOptims
https://github.com/AnotherSamWilson/ParBayesianOptimization/blob/master/R/getNextParameters.R#L17 contains 125 rows with only very few distinct combinations of gpUtility
and relUtility
:
Hi, first of all thanks for that great package, which speeds up hyperparameter optimization enormously!!
I am currently stuck with getting reproducible results for xgboost on different machines using parallelization. Actually, it seems as if
set.seed()
does not have any effect. When comparing the output frombayesOpt
appx. the first 60 rounds are identical in terms of the chosen hyperparameters and the Score, however then the results start to diverge slightly between the different machines (running the same code with the same data).I already experimented with
parallel::clusterSetRNGStream
, however, without success so far.The initialization of the parallel backend is as follows:
In addition, what is very strange that optimizing hyperparameters for
glmnet
andranger
using the same wrapper function to initialize the parallel backend and to parameterizebayesOpt
indeed works and yields identical and reproducible results on both machines.Is there anything that I might have forgot in order to get fully reproducible results when calling
bayesOpt
? (It would be really really great to have a tutorial on how to properly set up a parallel backend that produces reproducible results.)Edit: Could be the use of
acqThresh
with a value smaller than 1 be related to the issue / doesacqThresh
negatively affect reproducibility?