'summary statistics' and 'sensitivity to sampling' capacities (JG data; informing Rethink analysis)

daaronr commented 3 years ago

Summary statistics on 'JG fundraising pages' and 'sensitivity/representativeness tests and reporting' as example for Rethink.

Rows = variables (discrete and continuous) Columns = stats (N, means, quantiles, trimmed mean)

Also consider graphical rep.

See https://daaronr.github.io/metrics_discussion/surveys.html#jazz-case for the deeper issue

daaronr commented 3 years ago

Suppose we were trying to profile the 'typical JG fundraising page' (look at 'plausibly completed pages')

Some statistics are binary (or categorical/factor, but for this exercise, dummies may be easier, may need to define these variables : effective-charity, sporting-event, started-on-weekend, post-covid

Others are continuous: sum_don, count_don, fundraising_target

We want a table of summary statistics, one row per variable

count nonmissing and mean for all
also, for continuous p25, median, p75 and 'mean removing top/bottom 10% tails'

Then we want some way to report (visually and numerically), and 'test' the sensitivity of these to our sampling choices (over time, inclusion/exclusion/weighting of certain categories), considering 'representativeness' of our sample.

Gerhard suggests gtsummary is good for this.

daaronr commented 3 years ago

eas_20 %>% select(key_demog) %>% summarise_all(
                    funs(
                      Xq25 = quantile(., 0.25), 
                      Xmedian = median, 
                      Xq75 = quantile(., 0.75), 
                      Xmean = mean),
                      na.rm=TRUE
                      ) %>%
      gather(stat, val)  %>%
    separate(stat, into = c("var", "stat"), sep = "_X") %>%
    spread(stat, val) %>%
    select(var, q25, mean, median, q75)

gets me most of the way there but

I cannot seem to include the 'number nonmissing obs' counter (nor the 'trimmed means' yet)
still thinking about how to deal with sensitivity checks
I don't want to see 'dumb statistics' -- the quantiles for the dummy variables

daaronr commented 3 years ago

update: @oskasf gtsummary::tbl_summary() is so great ... hold off on this until i exploit that fully :)

daaronr / metrics_discussion

'summary statistics' and 'sensitivity to sampling' capacities (JG data; informing Rethink analysis) #2