HealthCatalyst / healthcareai-r

R tools for healthcare machine learning
https://docs.healthcare.ai
Other
245 stars 106 forks source link

Create summary.missingness #1215

Closed glenrs closed 6 years ago

glenrs commented 6 years ago

@michaellevy, I made a lot of revisions hopefully this is already close to merging.

glenrs commented 6 years ago

@michaellevy I made those changes besides on lines 174-176. top_n will take all the rows that tie. I don't think that is the functionality that we want there. Correct me if I am wrong.

michaellevy commented 6 years ago

@glenrs In addition to the two replies above...

I think I missed this in a previous PR, but I'd like you to fix it here: When missingness finds possible_na values, you added to the warning a bit of code that the user can run to replace the values with NA. The idea there was to let the user copy and paste that code as is, which you did with the possible_na values, but it always uses my_df for the name of the data.frame. I'd like you to change that to print the name of the argument the user passed to the d parameter. I think I mentioned to look for places where we use match.call for how to do this. This is relatively advanced R stuff, so feel free to ask for help.

Also, look through the codecov report and see if there are aspects of the functionality you're not testing that are reasonable to test.

michaellevy commented 6 years ago

Hey @glenrs Sorry, one more thing: Figure out a better way to handle when there's no missingness:

library(healthcareai)
#> healthcareai version 2.1.1
#> Please visit https://docs.healthcare.ai for full documentation and vignettes. Join the community at https://healthcare-ai.slack.com
data.frame(x = letters, y = 1:26) %>% 
  missingness() %>% 
  summary()
#> `object` has no variables with missingness.NULL

Created on 2018-08-08 by the reprex package (v0.2.0).

glenrs commented 6 years ago

@michaellevy Just showed you my change. This is ready for you to look at it again.

michaellevy commented 6 years ago

@glenrs It looks like the output comes out in the wrong order. This is the kind of thing I mean when I say I want you to double and triple check your work.

library(healthcareai)
#> healthcareai version 2.1.1
#> Please visit https://docs.healthcare.ai for full documentation and vignettes. Join the community at https://healthcare-ai.slack.com
data.frame(x = c(letters, NA), y = 1:27) %>% 
  missingness() %>% 
  summary()
#> # A tibble: 2 x 2
#>   percent_missing n_variables
#>             <dbl>       <int>
#> 1             3.7           1
#> 2             0             1
#> Missingness summary:
#> 50% of data variables contain missingness.
#> `x` contains the most missingness with 3.7% of observations containing missing values.
#> 
#> Number of variables with levels of missingness:
glenrs commented 6 years ago

@michaellevy Thanks for giving me an example of when I am not being as careful as I should be. Keep letting me know. I read over this several more times. I found a couple more things. I will try harder.

Question: Currently the output includes 0 in the table if it is one of the variables with the most amount of missingness. I can easily filter it out if we don't want 0. What are your thoughts?

michaellevy commented 6 years ago

@glenrs Good question. I don't feel strongly about it either way... it's your function, so why don't you choose whichever you think the user would want.

glenrs commented 6 years ago

@michaellevy I decided to add the filter. We don't care about how many columns have 0 percent missingness.

Side note: Since this function is designed for wide datasets, this probably won't matter a lot anyways.

When you have a second, this is ready for you to look at. Thank you.

michaellevy commented 6 years ago

Looks good @glenrs Thanks for all the rounds of refinement. Merge in master and then I'll merge this.

glenrs commented 6 years ago

Thank you.

From: Michael Levy notifications@github.com Reply-To: HealthCatalyst/healthcareai-r reply@reply.github.com Date: Thursday, August 9, 2018 at 4:33 PM To: HealthCatalyst/healthcareai-r healthcareai-r@noreply.github.com Cc: Rex Sumsion rex.sumsion@healthcatalyst.com, Mention mention@noreply.github.com Subject: Re: [HealthCatalyst/healthcareai-r] Create summary.missingness (#1215)

Looks good @glenrshttps://github.com/glenrs Thanks for all the rounds of refinement. Merge in master and then I'll merge this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/HealthCatalyst/healthcareai-r/pull/1215#issuecomment-411919168, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AckgSpimmvyZMn07yvL_AajjGTTAK5tYks5uPLifgaJpZM4Vydh6.

glenrs commented 6 years ago

@michaellevy sorry, I misread that message earlier. I thought you were saying that you would merge master. haha well, master is merged now.

codecov[bot] commented 6 years ago

Codecov Report

Merging #1215 into master will increase coverage by <.1%. The diff coverage is 100%.

@@           Coverage Diff            @@
##           master   #1215     +/-   ##
========================================
+ Coverage    94.3%   94.3%   +<.1%     
========================================
  Files          37      37             
  Lines        2434    2462     +28     
========================================
+ Hits         2296    2324     +28     
  Misses        138     138