HealthCatalyst / healthcareai-r

R tools for healthcare machine learning
https://docs.healthcare.ai
Other
245 stars 106 forks source link

Rex 1254 bugnumericknnimpute #1271

Closed glenrs closed 6 years ago

glenrs commented 6 years ago

@mmastand, this issues raises warnings and messages for issues #1254 and #998.

When errors were thrown it was difficult to see the warning message, so I decided to make these messages. In R these messages are displayed nicely. Let me know what you think.

Summary of changes:

I also changed a test. I didn't agree with the comment. It said that imputation shouldn't work with random variables. I don't think this would be an issue. After I changed the introduced column to factor type the imputation worked.. When you are looking at the tests you might not agree with what I have done. Let me know what you think. Thank you!

library(healthcareai)
#> healthcareai version 2.2.0
#> Please visit https://docs.healthcare.ai for full documentation and vignettes. Join the community at https://healthcare-ai.slack.com

prep_data(pima_diabetes, outcome = age, impute=list(nominal_method="knnimpute"))
#> Training new data prep recipe...
#> `knnimpute` depends on another library that does not support character columns yet. If `knnimpute` fails please convert all character columns to factors for bag imputation.
#> Error in gower_work(x = x, y = y, pair_x = pair_x, pair_y = pair_y, n = n, : STRING_ELT() can only be applied to a 'character vector', not a 'integer'
prep_data(pima_diabetes, outcome = age, impute=list(numeric_method="knnimpute"))
#> Training new data prep recipe...
#> `knnimpute` depends on another library that does not support character columns yet. If `knnimpute` fails please convert all character columns to factors for bag imputation.
#> Error in gower_work(x = x, y = y, pair_x = pair_x, pair_y = pair_y, n = n, : STRING_ELT() can only be applied to a 'character vector', not a 'integer'
prep_data(pima_diabetes, outcome = age, impute=list(nominal_method="bagimpute"))
#> Training new data prep recipe...
#> Warning in hcai_impute(., numeric_method = ip$numeric_method,
#> nominal_method = ip$nominal_method, : `bagimpute` depends on another
#> library that does not support character columns yet. If `bagimpute` does
#> not impute missing values, please convert all character columns to factors.
#> If `collapse_rare_factors` is TRUE, the values that are not imputed might
#> be contained in "other".
#> healthcareai-prepped data. Recipe used to prepare data:
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          9
#> 
#> Training data contained 768 data points and 376 incomplete rows. 
#> 
#> Operations:
#> 
#> Sparse, unbalanced variable filter removed no terms [trained]
#> Mean Imputation for patient_id, pregnancies, ... [trained]
#> Bagged tree imputation for weight_class, diabetes [trained]
#> Adding levels to: other, missing [trained]
#> Collapsing factor levels for weight_class, diabetes [trained]
#> Adding levels to: other, missing [trained]
#> Dummy variables from weight_class and diabetes [trained]
#> Current data:
#> # A tibble: 768 x 16
#>    patient_id pregnancies plasma_glucose diastolic_bp skinfold insulin
#>         <int>       <int>          <dbl>        <dbl>    <dbl>   <dbl>
#>  1          1           6            148         72       35      156.
#>  2          2           1             85         66       29      156.
#>  3          3           8            183         64       29.2    156.
#>  4          4           1             89         66       23       94 
#>  5          5           0            137         40       35      168 
#>  6          6           5            116         74       29.2    156.
#>  7          7           3             78         50       32       88 
#>  8          8          10            115         72.4     29.2    156.
#>  9          9           2            197         70       45      543 
#> 10         10           8            125         96       29.2    156.
#> # ... with 758 more rows, and 10 more variables: pedigree <dbl>,
#> #   age <int>, weight_class_morbidly.obese <dbl>,
#> #   weight_class_normal <dbl>, weight_class_overweight <dbl>,
#> #   weight_class_other <dbl>, weight_class_missing <dbl>,
#> #   diabetes_Y <dbl>, diabetes_other <dbl>, diabetes_missing <dbl>

Created on 2018-10-04 by the reprex package (v0.2.0).

codecov[bot] commented 6 years ago

Codecov Report

Merging #1271 into master will increase coverage by <.1%. The diff coverage is 100%.

@@           Coverage Diff            @@
##           master   #1271     +/-   ##
========================================
+ Coverage    95.2%   95.3%   +<.1%     
========================================
  Files          40      40             
  Lines        3162    3183     +21     
========================================
+ Hits         3013    3034     +21     
  Misses        149     149