RMHogervorst / blog

posts about R
https://blog.rmhogervorst.nl
2 stars 1 forks source link

tune_grid failing in "how-to-use-lightgbm-with-tidymodels-framework" #19

Closed marco-vene closed 3 years ago

marco-vene commented 3 years ago

Hi Roel,

I saw your post on r-bloggers, Thanks for the great article!

I tried to reproduce the code and I installed lightgbm correctly I think because the sample code is working fine, but the code is failing at tune_grid step.

Thanks for your help.

lgbm_tuned <- tune::tune_grid(
     object = lgbm_wf,
     resamples = ames_cv_folds,
     grid = lgbm_grid,
     metrics = yardstick::metric_set(rmse, rsq, mae),
     control = tune::control_grid(verbose = FALSE) 
 )

I get the WARNING:

Warning message:
All models failed in tune_grid(). See the `.notes` column. 

In the .notes column I see internal: Error in pkg_list[[1]]: subscript out of bounds My SessionInfo()

R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] parallel stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] janitor_2.0.1 AmesHousing_0.0.4 doParallel_1.0.15 [4] iterators_1.0.12 foreach_1.5.0 treesnip_0.1.0
[7] yardstick_0.0.7 workflows_0.1.2 dials_0.0.8
[10] scales_1.1.1 tune_0.1.1 parsnip_0.1.2
[13] recipes_0.1.13 dplyr_1.0.0 rsample_0.0.7
[16] ggplot2_3.3.2 lightgbm_3.0.0-1 R6_2.4.1

loaded via a namespace (and not attached): [1] httr_1.4.2 tidyr_1.1.0 jsonlite_1.7.0
[4] splines_4.0.2 prodlim_2019.11.13 assertthat_0.2.1
[7] GPfit_1.0-8 blob_1.2.1 remotes_2.2.0
[10] globals_0.12.5 ipred_0.9-9 pillar_1.4.6
[13] lattice_0.20-41 glue_1.4.1 pROC_1.16.2
[16] digest_0.6.25 snakecase_0.11.0 colorspace_1.4-1
[19] htmltools_0.5.0 Matrix_1.2-18 plyr_1.8.6
[22] timeDate_3043.102 pkgconfig_2.0.3 lhs_1.0.2
[25] DiceDesign_1.8-1 listenv_0.8.0 purrr_0.3.4
[28] processx_3.4.3 gower_0.2.2 lava_1.6.7
[31] tibble_3.0.3 generics_0.0.2 ellipsis_0.3.1
[34] DT_0.14 withr_2.2.0 furrr_0.1.0
[37] nnet_7.3-14 cli_2.0.2 survival_3.1-12
[40] magrittr_1.5 crayon_1.3.4 ps_1.3.3
[43] future_1.18.0 fansi_0.4.1 MASS_7.3-51.6
[46] forcats_0.5.0 class_7.3-17 tools_4.0.2
[49] data.table_1.12.8 lifecycle_0.2.0 stringr_1.4.0
[52] munsell_0.5.0 callr_3.4.3 compiler_4.0.2
[55] rlang_0.4.7 grid_4.0.2 rstudioapi_0.11
[58] htmlwidgets_1.5.1 igraph_1.2.5 gtable_0.3.0
[61] codetools_0.2-16 DBI_1.1.0 lubridate_1.7.9
[64] utf8_1.1.4 stringi_1.4.6 Rcpp_1.0.5
[67] vctrs_0.3.2 rpart_4.1-15 dbplyr_1.4.4
[70] tidyselect_1.1.0

RMHogervorst commented 3 years ago

Hi Marco, thanks for your question! Did you change the values in grid? It could be that some hyperparameter combinations are just not working? I don't have the versions of the packages in my mind but could you try updating if you are not yet at the latest version? It is an odd error and frankly I'm out of my depth here.

marco-vene commented 3 years ago

Thanks for the feedback. Actually the problem is caused by the Parallel processing part. My packages are all updated, but the code still does not work. When I remove the parallel processing code part:

all_cores <- parallel::detectCores(logical = FALSE) 
registerDoParallel(cores = all_cores) 

the tune_grid runs smoothly.

Any idea why?

RMHogervorst commented 3 years ago

Could it be windows thing? I ran it on Mac. Maybe the parallel package needs something different on windows. Anyway to speed up the process you can activate parallelism in lightGBM so that process runs faster

marco-vene commented 3 years ago

Maybe! Thanks for your feedback!