amices / mice

Multivariate Imputation by Chained Equations
https://amices.org/mice/
GNU General Public License v2.0
428 stars 107 forks source link

error with futuremice when n.core is not specified and exceeds m #533

Closed cdolladille closed 1 year ago

cdolladille commented 1 year ago

Hi all, I'm pretty new to GitHub issues. I apologize if this is a duplicate or already answered, but I could not find it fixed in another issue.

I have a 16 cores processor, when I try to run imp <- futuremice(nhanes, maxit = 2, m = 2)

I have the following warning message Number of cores not specified. Based on your machine a value of n.core = 15 is chosen; the imputations are distributed about equally over the cores.

Then this error

Error in (function (.x, .f, ..., .progress = FALSE)  : 
  ℹ In index: 1.
Caused by error:
! Number of imputations (m) lower than 1.

When I explicitly set n.core to 15, I have the following message and no error

In futuremice(nhanes, maxit = 2, m = 2, n.core = 15) :
  The number of cores exceeds the number of imputations. The number of cores used is set equal to the number of imputations (m = 2 ).

My guess is that automatic setting of n.core should check m before. Something like if(machine.cores >m) {n.core <- m} else { n.core <- machine.cores - 1 }

Here is my sessionInfo()

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.utf8  LC_CTYPE=French_France.utf8    LC_MONETARY=French_France.utf8
[4] LC_NUMERIC=C                   LC_TIME=French_France.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mice_3.15.0     forcats_0.5.2   stringr_1.5.0   dplyr_1.0.10    purrr_1.0.0     readr_2.1.3    
 [7] tidyr_1.2.1     tibble_3.1.8    ggplot2_3.4.0   tidyverse_1.3.2

loaded via a namespace (and not attached):
  [1] TH.data_1.1-1       googledrive_2.0.0   colorspace_2.0-3    deldir_1.0-6       
  [5] ellipsis_0.3.2      class_7.3-20        rprojroot_2.0.3     htmlTable_2.4.1    
  [9] base64enc_0.1-3     fs_1.6.0            gld_2.6.6           rstudioapi_0.14    
 [13] proxy_0.4-27        listenv_0.9.0       furrr_0.3.1         MatrixModels_0.5-1 
 [17] fansi_1.0.3         mvtnorm_1.1-3       lubridate_1.9.0     xml2_1.3.3         
 [21] codetools_0.2-18    splines_4.2.2       rootSolve_1.8.2.3   knitr_1.41         
 [25] Formula_1.2-4       jsonlite_1.8.4      broom_1.0.2         cluster_2.1.4      
 [29] dbplyr_2.3.0        png_0.1-8           compiler_4.2.2      httr_1.4.4         
 [33] backports_1.4.1     assertthat_0.2.1    Matrix_1.5-3        fastmap_1.1.0      
 [37] gargle_1.2.1        cli_3.5.0           htmltools_0.5.4     quantreg_5.94      
 [41] tools_4.2.2         gtable_0.3.1        glue_1.6.2          lmom_2.9           
 [45] Rcpp_1.0.9          cellranger_1.1.0    vctrs_0.5.1         nlme_3.1-160       
 [49] xfun_0.36           globals_0.16.2      rvest_1.0.3         timechange_0.2.0   
 [53] lifecycle_1.0.3     googlesheets4_1.0.1 polspline_1.1.22    future_1.30.0      
 [57] zoo_1.8-11          MASS_7.3-58.1       scales_1.2.1        hms_1.1.2          
 [61] sandwich_3.0-2      parallel_4.2.2      expm_0.999-7        SparseM_1.81       
 [65] RColorBrewer_1.1-3  yaml_2.3.7          Exact_3.2           gridExtra_2.3      
 [69] rms_6.4-1           rpart_4.1.19        latticeExtra_0.6-30 stringi_1.7.8      
 [73] e1071_1.7-12        checkmate_2.1.0     boot_1.3-28         rlang_1.0.6        
 [77] pkgconfig_2.0.3     evaluate_0.20       lattice_0.20-45     htmlwidgets_1.6.1  
 [81] tidyselect_1.2.0    here_1.0.1          parallelly_1.34.0   magrittr_2.0.3     
 [85] R6_2.5.1            DescTools_0.99.47   generics_0.1.3      Hmisc_4.7-2        
 [89] multcomp_1.4-20     DBI_1.1.3           pillar_1.8.1        haven_2.5.1        
 [93] foreign_0.8-83      withr_2.5.0         survival_3.4-0      nnet_7.3-18        
 [97] modelr_0.1.10       crayon_1.5.2        interp_1.1-3        utf8_1.2.2         
[101] tzdb_0.3.0          rmarkdown_2.20      jpeg_0.1-10         grid_4.2.2         
[105] readxl_1.4.1        data.table_1.14.6   reprex_2.0.2        digest_0.6.31      
[109] munsell_0.5.0      

Thank you! Charles

gerkovink commented 1 year ago

Thanks,

Mice performs by default 5 imputations, which cannot be distributed in futuremice over more than 5 cores. Hence the error. Either restrict the number of cores or increase m.

I'd rather not make adjustments to the user specified call and override the defaults as it may lull the user into a false sense of security. Now the user is forced to make a choice.

On Tue, 24 Jan 2023, 10:14 cdolladille, @.***> wrote:

Hi all, I'm pretty new to GitHub issues. I apologize if this is a duplicate or already answered, but I could not find it fixed in another issue.

I have a 16 cores processor, when I try to run imp <- futuremice(nhanes, maxit = 2, m = 2)

I have the following warning message Number of cores not specified. Based on your machine a value of n.core = 15 is chosen; the imputations are distributed about equally over the cores.

Then this error

Error in (function (.x, .f, ..., .progress = FALSE) :

ℹ In index: 1.

Caused by error:

! Number of imputations (m) lower than 1.

When I explicitly set n.core to 15, I have the following message and no error

In futuremice(nhanes, maxit = 2, m = 2, n.core = 15) :

The number of cores exceeds the number of imputations. The number of cores used is set equal to the number of imputations (m = 2 ).

My guess is that automatic setting of n.core should check m before. Something like if(machine.cores >m) {n.core <- m} else { n.core <- machine.cores - 1 }

Here is my sessionInfo()

R version 4.2.2 (2022-10-31 ucrt)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:

[1] LC_COLLATE=French_France.utf8 LC_CTYPE=French_France.utf8 LC_MONETARY=French_France.utf8

[4] LC_NUMERIC=C LC_TIME=French_France.utf8

attached base packages:

[1] stats graphics grDevices utils datasets methods base

other attached packages:

[1] mice_3.15.0 forcats_0.5.2 stringr_1.5.0 dplyr_1.0.10 purrr_1.0.0 readr_2.1.3

[7] tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2

loaded via a namespace (and not attached):

[1] TH.data_1.1-1 googledrive_2.0.0 colorspace_2.0-3 deldir_1.0-6

[5] ellipsis_0.3.2 class_7.3-20 rprojroot_2.0.3 htmlTable_2.4.1

[9] base64enc_0.1-3 fs_1.6.0 gld_2.6.6 rstudioapi_0.14

[13] proxy_0.4-27 listenv_0.9.0 furrr_0.3.1 MatrixModels_0.5-1

[17] fansi_1.0.3 mvtnorm_1.1-3 lubridate_1.9.0 xml2_1.3.3

[21] codetools_0.2-18 splines_4.2.2 rootSolve_1.8.2.3 knitr_1.41

[25] Formula_1.2-4 jsonlite_1.8.4 broom_1.0.2 cluster_2.1.4

[29] dbplyr_2.3.0 png_0.1-8 compiler_4.2.2 httr_1.4.4

[33] backports_1.4.1 assertthat_0.2.1 Matrix_1.5-3 fastmap_1.1.0

[37] gargle_1.2.1 cli_3.5.0 htmltools_0.5.4 quantreg_5.94

[41] tools_4.2.2 gtable_0.3.1 glue_1.6.2 lmom_2.9

[45] Rcpp_1.0.9 cellranger_1.1.0 vctrs_0.5.1 nlme_3.1-160

[49] xfun_0.36 globals_0.16.2 rvest_1.0.3 timechange_0.2.0

[53] lifecycle_1.0.3 googlesheets4_1.0.1 polspline_1.1.22 future_1.30.0

[57] zoo_1.8-11 MASS_7.3-58.1 scales_1.2.1 hms_1.1.2

[61] sandwich_3.0-2 parallel_4.2.2 expm_0.999-7 SparseM_1.81

[65] RColorBrewer_1.1-3 yaml_2.3.7 Exact_3.2 gridExtra_2.3

[69] rms_6.4-1 rpart_4.1.19 latticeExtra_0.6-30 stringi_1.7.8

[73] e1071_1.7-12 checkmate_2.1.0 boot_1.3-28 rlang_1.0.6

[77] pkgconfig_2.0.3 evaluate_0.20 lattice_0.20-45 htmlwidgets_1.6.1

[81] tidyselect_1.2.0 here_1.0.1 parallelly_1.34.0 magrittr_2.0.3

[85] R6_2.5.1 DescTools_0.99.47 generics_0.1.3 Hmisc_4.7-2

[89] multcomp_1.4-20 DBI_1.1.3 pillar_1.8.1 haven_2.5.1

[93] foreign_0.8-83 withr_2.5.0 survival_3.4-0 nnet_7.3-18

[97] modelr_0.1.10 crayon_1.5.2 interp_1.1-3 utf8_1.2.2

[101] tzdb_0.3.0 rmarkdown_2.20 jpeg_0.1-10 grid_4.2.2

[105] readxl_1.4.1 data.table_1.14.6 reprex_2.0.2 digest_0.6.31

[109] munsell_0.5.0

Thank you! Charles

— Reply to this email directly, view it on GitHub https://github.com/amices/mice/issues/533, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABT2AKCIKA5BBNEZMZTQTQTWT6MRFANCNFSM6AAAAAAUE2QBS4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

cdolladille commented 1 year ago

Thank you for your reply, In my setting, I have 3 different computers/servers with different number of cores and I may have to run the same script on either of these. I appreciated the flexibility of not providing the number of cores, so that I could just pass my script along to any of these units.

I acknowledge this might not be a frequent use case.

thomvolker commented 1 year ago

I agree with @gerkovink, but I also think that we can easily tackle the issue by always falling back to the highest number of cores that makes sense given the number of imputed datasets, and informing the user that we did this by printing a message. In this way, I think we can accommodate flexibility and security for the user.

stefvanbuuren commented 1 year ago

@thomvolker Thanks. Closing because this is solved by #550.