TysonStanley / furniture

The furniture R package contains table1 for publication-ready simple and stratified descriptive statistics, tableC for publication-ready correlation matrixes, and other tables #rstats
49 stars 6 forks source link

consider adding option to include missing report for all variables #23

Open ericpgreen opened 4 years ago

ericpgreen commented 4 years ago

The nhanes_2010 dataset has 1417 obs, but the summary table will indicate a total of 940 obs (complete cases) unless we specify na.rm=FALSE. The documentation says this about the na.rm parameter:

when set to FALSE it also shows how many missing values are in the data for each categorical variable being summarized

And that's what this does...

library(furniture)
library(tidyverse)
data("nhanes_2010")

nhanes_2010 %>%
  furniture::table1("Age Mean (SD)" = age,                
         "Health" = gen_health,
         "Sex" = gender,
         "Cancer" = cancer, 
         "Asthma" = asthma,
         test = TRUE,                            
         output = "html",
         na.rm = FALSE,
         total = TRUE,
         type = "condense")                      

We see that 155 obs are missing on gen_health. But I'm wondering if there could be an option to also show missing for all variables. This table gives me the impression that we're only missing data for the gen_health variable, but that's not the case.

table(nhanes_2010$cancer, useNA = "always") returns 344 missing

table(nhanes_2010$asthma, useNA = "always") returns 2 missing

There are no missing on the numeric variable age, but that would be interesting to know too.

TysonStanley commented 4 years ago

Yes, this is something I've been wanting to work on but haven't had the time yet. Your post will help push it toward the top of the priority list. For a short-term fix (bc it would be more straightforward with formatting), I was thinking of adding the missing for the continuous variables next to the name of the variable. Something like:

----------------------------------
                      Mean/SD
Var1 (missing = 34)   2.5 (3.2)
----------------------------------

What do you think?

ericpgreen commented 4 years ago

Thanks! That seems reasonable to me. It would make for a really long label if Var1 is long, but certainly good in the short term.

On Mon, Jan 6, 2020 at 1:31 PM Tyson Barrett notifications@github.com wrote:

Yes, this is something I've been wanting to work on but haven't had the time yet. Your post will help push it toward the top of the priority list. For a short-term fix (bc it would be more straightforward with formatting), I was thinking of adding the missing for the continuous variables next to the name of the variable. Something like:


                  Mean/SD

Var1 (missing = 34) 2.5 (3.2)

What do you think?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TysonStanley/furniture/issues/23?email_source=notifications&email_token=AAEFNAIX2R6ROK33TBSIRKDQ4N2G5A5CNFSM4KDGF6S2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIGKOSY#issuecomment-571254603, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEFNAKNHFRQKO2UPDQVKZLQ4N2G5ANCNFSM4KDGF6SQ .