daaronr / metrics_discussion

Notes and writings on econometrics to weave into content. Formatted 'bookdown'.
https://daaronr.github.io/metrics_discussion/introduction.html
3 stars 2 forks source link

'summary statistics' and 'sensitivity to sampling' capacities (JG data; informing Rethink analysis) #2

Open daaronr opened 3 years ago

daaronr commented 3 years ago

Summary statistics on 'JG fundraising pages' and 'sensitivity/representativeness tests and reporting' as example for Rethink.

Rows = variables (discrete and continuous) Columns = stats (N, means, quantiles, trimmed mean)

Also consider graphical rep.

See https://daaronr.github.io/metrics_discussion/surveys.html#jazz-case for the deeper issue

daaronr commented 3 years ago

Suppose we were trying to profile the 'typical JG fundraising page' (look at 'plausibly completed pages')

Some statistics are binary (or categorical/factor, but for this exercise, dummies may be easier, may need to define these variables : effective-charity, sporting-event, started-on-weekend, post-covid

Others are continuous: sum_don, count_don, fundraising_target

We want a table of summary statistics, one row per variable

Then we want some way to report (visually and numerically), and 'test' the sensitivity of these to our sampling choices (over time, inclusion/exclusion/weighting of certain categories), considering 'representativeness' of our sample.

Gerhard suggests gtsummary is good for this.

daaronr commented 3 years ago
eas_20 %>% select(key_demog) %>% summarise_all(
                    funs(
                      Xq25 = quantile(., 0.25), 
                      Xmedian = median, 
                      Xq75 = quantile(., 0.75), 
                      Xmean = mean),
                      na.rm=TRUE
                      ) %>%
      gather(stat, val)  %>%
    separate(stat, into = c("var", "stat"), sep = "_X") %>%
    spread(stat, val) %>%
    select(var, q25, mean, median, q75) 

gets me most of the way there but

daaronr commented 3 years ago

update: @oskasf gtsummary::tbl_summary() is so great ... hold off on this until i exploit that fully :)