Closed glenrs closed 6 years ago
@michaellevy I made those changes besides on lines 174-176. top_n
will take all the rows that tie. I don't think that is the functionality that we want there. Correct me if I am wrong.
@glenrs In addition to the two replies above...
I think I missed this in a previous PR, but I'd like you to fix it here: When missingness
finds possible_na
values, you added to the warning a bit of code that the user can run to replace the values with NA. The idea there was to let the user copy and paste that code as is, which you did with the possible_na
values, but it always uses my_df
for the name of the data.frame. I'd like you to change that to print the name of the argument the user passed to the d
parameter. I think I mentioned to look for places where we use match.call
for how to do this. This is relatively advanced R stuff, so feel free to ask for help.
Also, look through the codecov report and see if there are aspects of the functionality you're not testing that are reasonable to test.
Hey @glenrs Sorry, one more thing: Figure out a better way to handle when there's no missingness:
library(healthcareai)
#> healthcareai version 2.1.1
#> Please visit https://docs.healthcare.ai for full documentation and vignettes. Join the community at https://healthcare-ai.slack.com
data.frame(x = letters, y = 1:26) %>%
missingness() %>%
summary()
#> `object` has no variables with missingness.NULL
Created on 2018-08-08 by the reprex package (v0.2.0).
@michaellevy Just showed you my change. This is ready for you to look at it again.
@glenrs It looks like the output comes out in the wrong order. This is the kind of thing I mean when I say I want you to double and triple check your work.
library(healthcareai)
#> healthcareai version 2.1.1
#> Please visit https://docs.healthcare.ai for full documentation and vignettes. Join the community at https://healthcare-ai.slack.com
data.frame(x = c(letters, NA), y = 1:27) %>%
missingness() %>%
summary()
#> # A tibble: 2 x 2
#> percent_missing n_variables
#> <dbl> <int>
#> 1 3.7 1
#> 2 0 1
#> Missingness summary:
#> 50% of data variables contain missingness.
#> `x` contains the most missingness with 3.7% of observations containing missing values.
#>
#> Number of variables with levels of missingness:
@michaellevy Thanks for giving me an example of when I am not being as careful as I should be. Keep letting me know. I read over this several more times. I found a couple more things. I will try harder.
Question: Currently the output includes 0 in the table if it is one of the variables with the most amount of missingness. I can easily filter it out if we don't want 0. What are your thoughts?
@glenrs Good question. I don't feel strongly about it either way... it's your function, so why don't you choose whichever you think the user would want.
@michaellevy I decided to add the filter. We don't care about how many columns have 0 percent missingness.
Side note: Since this function is designed for wide datasets, this probably won't matter a lot anyways.
When you have a second, this is ready for you to look at. Thank you.
Looks good @glenrs Thanks for all the rounds of refinement. Merge in master and then I'll merge this.
Thank you.
From: Michael Levy notifications@github.com Reply-To: HealthCatalyst/healthcareai-r reply@reply.github.com Date: Thursday, August 9, 2018 at 4:33 PM To: HealthCatalyst/healthcareai-r healthcareai-r@noreply.github.com Cc: Rex Sumsion rex.sumsion@healthcatalyst.com, Mention mention@noreply.github.com Subject: Re: [HealthCatalyst/healthcareai-r] Create summary.missingness (#1215)
Looks good @glenrshttps://github.com/glenrs Thanks for all the rounds of refinement. Merge in master and then I'll merge this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/HealthCatalyst/healthcareai-r/pull/1215#issuecomment-411919168, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AckgSpimmvyZMn07yvL_AajjGTTAK5tYks5uPLifgaJpZM4Vydh6.
@michaellevy sorry, I misread that message earlier. I thought you were saying that you would merge master. haha well, master is merged now.
Merging #1215 into master will increase coverage by
<.1%
. The diff coverage is100%
.
@@ Coverage Diff @@
## master #1215 +/- ##
========================================
+ Coverage 94.3% 94.3% +<.1%
========================================
Files 37 37
Lines 2434 2462 +28
========================================
+ Hits 2296 2324 +28
Misses 138 138
@michaellevy, I made a lot of revisions hopefully this is already close to merging.