FarrellDay / miceRanger

miceRanger: Fast Imputation with Random Forests in R
Other
67 stars 12 forks source link

Error with impute() if pass a dataset directly without amputeData() #13

Closed aosmith16 closed 3 years ago

aosmith16 commented 3 years ago

I'm getting an error if I use impute() with a "raw" dataset instead of passing a dataset after amputation with amputeData().

Working off of the example in the impute() documentation:

ampDat <- amputeData(iris)
miceObj <- miceRanger(ampDat, 1, 1, returnModels = TRUE, verbose = FALSE)

newDat <- amputeData(iris)
newImps <- impute(newDat, miceObj)
#> 
#> dataset 1 
#> iteration 1   | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species

impute(iris, miceObj)
#> Error in `[.data.frame`(data, , vara, with = FALSE): unused argument (with = FALSE)

Created on 2021-09-02 by the reprex package (v2.0.0)

I believe the error traces back to line 73 of impute.R, with the code data[,vara,with=FALSE] within apply(). However, I did not investigate amputeData() to see why amputated datasets allow this code to work.

A work-around is to use amputeData() with perc = 0. If this should be the standard approach maybe add to the documentation for impute()?

samFarrellDay commented 3 years ago

This is because the package deals entirely in data.table syntax, but doesn't actually cast the new data to a data.table. amputeData returns a datatable, which is what allows it to work. I'll fix this today. Just to be sure, can you print the output of sessionInfo() please.

samFarrellDay commented 3 years ago

As a short-term fix, you can run setDT(iris) before trying to impute and it should work.

aosmith16 commented 3 years ago

Ah, that makes sense now. I wasn't sure where with was coming from. :)

Session info:

R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] miceRanger_1.4.0 ggplot2_3.3.5    tidyr_1.1.3      dplyr_1.0.7      knitr_1.33      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7        here_1.0.1        mvtnorm_1.1-1     lattice_0.20-41   FNN_1.1.3        
 [6] class_7.3-18      assertthat_0.2.1  rprojroot_2.0.2   digest_0.6.27     foreach_1.5.0    
[11] utf8_1.1.4        ranger_0.13.1     cellranger_1.1.0  R6_2.5.0          backports_1.1.10 
[16] evaluate_0.14     rootSolve_1.8.2.1 e1071_1.7-4       pillar_1.5.1      rlang_0.4.11     
[21] readxl_1.3.1      Exact_2.1         curl_4.3          rstudioapi_0.13   data.table_1.13.0
[26] car_3.0-10        Matrix_1.3-4      rmarkdown_2.10    foreign_0.8-81    munsell_0.5.0    
[31] broom_0.7.9       compiler_4.0.5    xfun_0.24         pkgconfig_2.0.3   DescTools_0.99.40
[36] htmltools_0.5.1.1 tidyselect_1.1.0  tibble_3.1.0      lmom_2.8          expm_0.999-6     
[41] rio_0.5.16        codetools_0.2-18  fansi_0.4.1       crayon_1.4.1      withr_2.4.2      
[46] ggpubr_0.4.0      MASS_7.3-53.1     grid_4.0.5        gtable_0.3.0      lifecycle_1.0.0  
[51] DBI_1.1.0         magrittr_2.0.1    scales_1.1.1      gld_2.6.2         zip_2.1.1        
[56] carData_3.0-4     stringi_1.6.2     ggsignif_0.6.2    ellipsis_0.3.1    generics_0.1.0   
[61] vctrs_0.3.6       boot_1.3-27       openxlsx_4.2.2    iterators_1.0.12  tools_4.0.5      
[66] forcats_0.5.0     glue_1.4.2        purrr_0.3.4       hms_0.5.3         abind_1.4-5      
[71] yaml_2.2.1        colorspace_1.4-1  rstatix_0.7.0     corrplot_0.90     haven_2.3.1  
samFarrellDay commented 3 years ago

This is fixed in the github version, on its way to CRAN now.

samFarrellDay commented 3 years ago

version 1.5.0 is on CRAN.