brian-j-smith / MachineShop

MachineShop: R package of models and tools for machine learning
https://brian-j-smith.github.io/MachineShop/
62 stars 10 forks source link

BootOptimismControl not working on Linux #9

Closed lang-benjamin closed 1 year ago

lang-benjamin commented 1 year ago

The following code that uses the optimism-corrected bootstrap resampling works on Windows and Mac OS, but not on Linux. Any idea what is going on (fyi, using CVControl() works under Linux)?

library(MachineShop, warn.conflicts = FALSE)
tuned_model <- TunedModel( 
  XGBTreeModel, 
  grid =  TuningGrid(), 
  control = BootOptimismControl()
)
MachineShop::fit(mpg ~ ., data = mtcars, model = tuned_model)
#> Error in .fit_optim(object, ...): Resampling failed for all models.
#> XGBTreeModel.1: `...` must be empty.
#> ✖ Problematic argument:
#> • group = case_comp_name(df, "groups")
#> XGBTreeModel.2: `...` must be empty.
#> ✖ Problematic argument:
#> • group = case_comp_name(df, "groups")
#> XGBTreeModel.3: `...` must be empty.
#> ✖ Problematic argument:
#> • group = case_comp_name(df, "groups")
#> XGBTreeModel.4: `...` must be empty.
#> ✖ Problematic argument:
#> • group = case_comp_name(df, "groups")
#> XGBTreeModel.5: `...` must be empty.
#> ✖ Problematic argument:
#> • group = case_comp_name(df, "groups")
#> XGBTreeModel.6: `...` must be empty.
#> ✖ Problematic argument:
#> • group = case_comp_name(df, "groups")
#> XGBTreeModel.7: `...` must be empty.
#> ✖ Problematic argument:
#> • group = case_comp_name(df, "groups")
#> XGBTreeModel.8: `...` must be empty.
#> ✖ Problematic argument:
#> • group = case_comp_name(df, "groups")
#> XGBTreeModel.9: `...` must be empty.
#> ✖ Problematic argument:
#> • group = case_comp_name(df, "groups")

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: TUXEDO OS 2
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] MachineShop_3.6.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyr_1.3.0         splines_4.1.2       foreach_1.5.2      
#>  [4] prodlim_2023.08.28  stats4_4.1.2        coin_1.4-2         
#>  [7] yaml_2.3.7          progress_1.2.2      globals_0.16.2     
#> [10] ipred_0.9-14        pillar_1.9.0        lattice_0.20-45    
#> [13] glue_1.6.2          digest_0.6.33       hardhat_1.3.0      
#> [16] sandwich_3.0-2      colorspace_2.1-0    recipes_1.0.8      
#> [19] htmltools_0.5.6     Matrix_1.6-1        timeDate_4022.108  
#> [22] pkgconfig_2.0.3     DiceDesign_1.9      listenv_0.9.0      
#> [25] purrr_1.0.2         mvtnorm_1.2-3       scales_1.2.1       
#> [28] gower_1.0.1         lava_1.7.2.1        timechange_0.2.0   
#> [31] tibble_3.2.1        generics_0.1.3      ggplot2_3.4.3      
#> [34] party_1.3-13        TH.data_1.1-2       withr_2.5.0        
#> [37] furrr_0.3.1         nnet_7.3-17         cli_3.6.1          
#> [40] strucchange_1.5-3   survival_3.2-13     magrittr_2.0.3     
#> [43] crayon_1.5.2        polspline_1.1.23    evaluate_0.21      
#> [46] fs_1.6.3            future_1.33.0       fansi_1.0.4        
#> [49] parallelly_1.36.0   MASS_7.3-55         dials_1.2.0        
#> [52] class_7.3-20        tools_4.1.2         data.table_1.14.8  
#> [55] prettyunits_1.1.1   hms_1.1.3           multcomp_1.4-25    
#> [58] lifecycle_1.0.3     matrixStats_1.0.0   kernlab_0.9-32     
#> [61] rsample_1.2.0       munsell_0.5.0       reprex_2.0.2       
#> [64] compiler_4.1.2      rlang_1.1.1         grid_4.1.2         
#> [67] iterators_1.0.14    rstudioapi_0.15.0   rmarkdown_2.24     
#> [70] gtable_0.3.4        codetools_0.2-18    abind_1.4-5        
#> [73] R6_2.5.1            zoo_1.8-12          lubridate_1.9.2    
#> [76] knitr_1.44          dplyr_1.1.3         fastmap_1.1.1      
#> [79] future.apply_1.11.0 utf8_1.2.3          libcoin_1.0-9      
#> [82] modeltools_0.2-23   parallel_4.1.2      Rcpp_1.0.11        
#> [85] vctrs_0.6.3         rpart_4.1.16        tidyselect_1.2.0   
#> [88] xfun_0.40

Created on 2023-09-15 with reprex v2.0.2

brian-j-smith commented 1 year ago

The errors might be due to recent changes in the rsample package and could vary across different machines depending on the version of that package installed. I have pushed some commits to the main branch, with the package version increased to 3.6.2.9000, that might address the issue. With those commits and all dependent packages updated to their latest versions, your example works for me on Windows. Would you mind installing the package from github (code below) to check if it fixes things for you? Otherwise, I can submit the changes to CRAN if you would rather wait and install it from there.

# Development version from GitHub
# install.packages("devtools")
devtools::install_github("brian-j-smith/MachineShop")
lang-benjamin commented 1 year ago

Thanks, using the dev version of MachineShop works!

brian-j-smith commented 1 year ago

An updated version of the package (3.7.0) with the fix is on its way to CRAN.