Open briatte opened 3 years ago
library(tidyverse) d <- haven::read_dta('/Users/fr/Documents/Teaching/SRQM/data/qog2019.dta') tibble( var = names(d), # data sources src = str_extract(names(d), ".*?_"), n = apply(d, 2, function(x) sum(!is.na(x))) ) %>% group_by(src) %>% summarise(n_vars = n(), min_N = min(n), max_N = max(n)) %>% arrange(min_N) %>% # arbitrary threshold at N = 50 filter(!is.na(src), min_N < 50) %>% print(n = 100)
PSI, EU, OECD, WWBI and a few others are particularly at fault:
# A tibble: 28 x 5 src n_vars min_N med_N max_N <chr> <int> <int> <dbl> <int> 1 psi_ 6 1 10.5 20 2 mad_ 4 15 29 163 3 eu_ 277 16 34 48 4 une_ 47 16 146 193 5 wwbi_ 38 17 41 62 6 oecd_ 281 19 37 44 7 wdi_ 278 19 156 192 8 dev_ 4 20 20 20 9 dpi_ 70 26 160. 175 10 bs_ 8 28 28 28 11 ess_ 9 28 28 28 12 ideavt_ 6 28 107 180 13 wel_ 36 29 32 189 14 wvs_ 42 29 34 34 15 aid_ 6 31 139 139 16 cses_ 2 31 31.5 32 17 gol_ 20 33 127 129 18 wiid_ 18 34 35 35 19 ucdp_ 2 35 70 105 20 cpds_ 49 36 36 36 21 h_ 11 37 165 185 22 lis_ 23 37 37 37 23 r_ 5 40 98 144 24 sgi_ 29 41 41 41 25 top_ 2 41 41 41 26 nelda_ 10 44 45 45 27 vi_ 13 45 48 50 28 qs_ 9 47 112 115
Not a bug, but leads students to build designs with low sample sizes.
PSI, EU, OECD, WWBI and a few others are particularly at fault:
Not a bug, but leads students to build designs with low sample sizes.