HealthCatalyst / healthcareai-r

R tools for healthcare machine learning
https://docs.healthcare.ai
Other
245 stars 106 forks source link

Add metric argument to machine_learn #1230

Closed glenrs closed 6 years ago

glenrs commented 6 years ago

@mmastand, Levy asked me to "confirm the metric is utilized from machine_learn". Since tune_models and flash_models are both tested independently, I decided that mocking these two functions would be the best way to test that the metric is utilized from machine_learn. I added documentation for this new parameter and an example for users to see how to pick their metric. I decided to include the example in its own section.

glenrs commented 6 years ago

@mmastand I made these corrections. I was curious if you wanted me to change the documentation in flash_models and tune_models as per your suggestions in this PR. If you do, just let me know. I could fix it quickly.

glenrs commented 6 years ago

@mmastand I think codecov is failing because none of the functions in training_setup.R are tested directly. The only way they are tested is through tune_models.R. I am happy to add more tests, but it might take some time. I think it might be out of the scope of this issue. What do you think?

glenrs commented 6 years ago

Ha, I guess the tests that I added were enough. I am still worried though about training_setup.R not being tested thoroughly. What are your thoughts?

Maybe in this issue we would include more tests for flash_models.

mmastand commented 6 years ago

Most of the setup_training functions are within test-tune_models. At some point, it might be worth moving those tests into their own file (test small bits, right?). Your test coverage looks good though. Your code and functionality also looks good. One small thing. tune, flash, and learn should be consistent for documentation and the first sentence in the metric param doesn't make sense. Please change all 3 files to have the following metric param:

@param metric Which metric should be used to assess model performance? Options for
classification: "ROC" (default) (area under the receiver operating
characteristic curve) or "PR" (area under the precision-recall curve).
Options for regression: "RMSE" (default) (root-mean-squared error,
default), "MAE" (mean-absolute error), or "Rsquared." For multiclass:
"Accuracy" (default) or "Kappa" (accuracy, adjusted for class imbalance).

I'll merge when that's in. Good work!

codecov[bot] commented 6 years ago

Codecov Report

Merging #1230 into master will increase coverage by <.1%. The diff coverage is 100%.

@@           Coverage Diff            @@
##           master   #1230     +/-   ##
========================================
+ Coverage    94.7%   94.8%   +<.1%     
========================================
  Files          39      39             
  Lines        2912    2933     +21     
========================================
+ Hits         2759    2781     +22     
+ Misses        153     152      -1
glenrs commented 6 years ago

@mmastand great add. That is much better. I added that paragraph to all three files, and merged master. Thank you!