Open daaronr opened 3 years ago
Suppose we were trying to profile the 'typical JG fundraising page' (look at 'plausibly completed pages')
Some statistics are binary (or categorical/factor, but for this exercise, dummies may be easier, may need to define these variables : effective-charity, sporting-event, started-on-weekend, post-covid
Others are continuous: sum_don, count_don, fundraising_target
We want a table of summary statistics, one row per variable
Then we want some way to report (visually and numerically), and 'test' the sensitivity of these to our sampling choices (over time, inclusion/exclusion/weighting of certain categories), considering 'representativeness' of our sample.
Gerhard suggests gtsummary
is good for this.
eas_20 %>% select(key_demog) %>% summarise_all(
funs(
Xq25 = quantile(., 0.25),
Xmedian = median,
Xq75 = quantile(., 0.75),
Xmean = mean),
na.rm=TRUE
) %>%
gather(stat, val) %>%
separate(stat, into = c("var", "stat"), sep = "_X") %>%
spread(stat, val) %>%
select(var, q25, mean, median, q75)
gets me most of the way there but
update: @oskasf gtsummary::tbl_summary()
is so great ... hold off on this until i exploit that fully :)
Summary statistics on 'JG fundraising pages' and 'sensitivity/representativeness tests and reporting' as example for Rethink.
Rows = variables (discrete and continuous) Columns = stats (N, means, quantiles, trimmed mean)
Also consider graphical rep.
See https://daaronr.github.io/metrics_discussion/surveys.html#jazz-case for the deeper issue