briatte commented 1 year ago

This one is complex enough to be its own issue…

Weighting guide

https://www.europeansocialsurvey.org/methodology/ess_methodology/data_processing_archiving/weighting.html https://www.europeansocialsurvey.org/docs/methodology/ESS_weighting_data_1_1.pdf

From the weighting guide, v1.1 (2020), page 7:

From round 9 onwards, all the necessary sample design indicators and weights are already included in the integrated (second release) data file, but if you are working with data from earlier rounds you will first need to merge the sample design indicators on to the main data file. For rounds 7 and 8, the sample design indicators are in the integrated SDDF (sample design data file), so you need to merge this file with the main integrated (questionnaire data) file. For rounds 1 to 6, sample design indicators are stored in a separate file for each country (and files are missing for some countries in some rounds), so you would need to merge several files. Furthermore, for these rounds the indicators psu and stratify have not been recoded in a manner suitable for cross-country analysis, so you will need to do this if you are analysing data from more than one country. Follow the guidance in section 2 of Kaminska & Lynn (2017) and ensure that each value is exclusive to one country.

The guide asks for the creation of anweight ('analytical weights') from the following variables:

# R, data.table syntax
data1[, anweight := pspwght * pweight * 10e3]
# Stata
# gen anweight=pspwght*pweight

Once anweight exists, weighting guide instructs the following design:

# R
svydesign(ids = ~psu, strata = ~stratum, weights = ~anweight, data = data1)
# Stata
# svyset psu [pweight=anweight], strata(stratum)

Details on analytical weights (ESS9+)

Quoting again from the weighting guide:

It is constructed by first deriving the design weight, then applying a post-stratification adjustment, and then a population size adjustment. Further details of how the weights are derived are documented in the round-specific report on the production of weights. Starting from Round 9, anweight is provided for you in the integrated data file. If you are using data from earlier ESS rounds, you can derive anweight yourself.

Full range of weighting variables, quoted from ESS9 codebook:

idno - Respondent's identification number
cntry - Country
dweight - Design weight
pspwght - Post-stratification weight including design weight
pweight - Population size weight (must be combined with dweight or pspwght)
anweight - Analysis weight
prob - Sampling probability
stratum - Sampling stratum
psu - Primary sampling unit

Notes:

pspwght includes dweight
anweight is just the product of pspwght and pweight
no obvious use for prob

Discussions

https://github.com/InductiveStep/R-notes/issues/1 https://github.com/ropensci/essurvey/issues/39 https://github.com/ropensci/essurvey/issues/9#issuecomment-502459202

Second link right above recommends the following for ESS4:

svydesign(
  ids = ~ psu + idno, # further comment at the link: specifying just `psu` would be enough
  strata = ~ stratify,
  weights = ~ dweight,
  nest = TRUE,
  data = ess4gb
)

Example: Andi Fugard, ESS9

Intermediate Quantitative Social Research, Birkbeck, University of London (2017-2020) https://inductivestep.github.io/R-notes/complex-surveys.html

Working on a multi-country example:

# using srvyr
as_survey_design(
  ids = idno, # instead of `psu` or `psu + idno` because `psu` is not in ESS9?
  strata = cntry,
  nest = TRUE,
  weights = pspwght
)

From the text:

The nest option takes account of the ids being nested within strata: in other words the same ID is used more than once across the dataset but only once in a country.

Example: Federico Vegetti, ESS7

Introduction to Survey Statistics, University of Heidelberg, 2018 https://federicovegetti.github.io/teaching/heidelberg_2018/lab/sst_lab_day2.html

When working on countries separately:

# using srvyr
as_survey_design(weights = c(dweight, pspwght)) %>%
  group_by(cntry) %>%
  # etc.

# ... doesn't pspwght include dweight?
# ... what about stratum? psu?

When working on all countries together:

# using srvyr
as_survey(weights = c(dweight, pspwght, pweight))

Example: Daniel Oberski, ESS7

http://asdfree.com/european-social-survey-ess.html

Working on a single country (Belgium) after merging the data to the SDDF file:

svydesign(
  ids = ~psu ,
  strata = ~stratify,
  probs = ~prob,
  data = ess_df
)

briatte commented 1 year ago

ESS now featured in Session 12 via a spatial viz example.

briatte commented 1 year ago

[ ] Try this, using lmer to get predicted probabilities: https://github.com/halhen/viz-pub/tree/master/ess-political-expression

briatte commented 1 year ago

z <- fs::dir_ls(regexp = "*.zip", recurse = TRUE)
v <- tibble()
for (i in z) {

  cat(fs::path_file(i))
  d <- unzip(i, exdir = tempdir())
  f <- str_subset(d, "dta$")
  cat(" ->", fs::path_file(f), "...\n")
  d <- haven::read_dta(f)
  n <- names(d)
  n <- n[ n %in% c("essround", "cntry", "psu", "idno", "stratify", "stratum",
                   "dweight", "pspwght", "pweight", "prob", "anweight") ]
  v <- bind_rows(v, tibble(file = f, n))

}

v %>% 
  mutate(file = fs::path_file(file)) %>% 
  pivot_wider(values_from = n, names_from = n) %>% 
  mutate(essround = as.integer(str_extract(file, "\\d+"))) %>% 
  arrange(essround)

# A tibble: 14 × 11
   file          essround idno  cntry dweight pspwght pweight anweight prob  stratum psu  
   <chr>            <int> <chr> <chr> <chr>   <chr>   <chr>   <chr>    <chr> <chr>   <chr>
 1 ESS1e06_6.dta        1 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 2 ESS4AT.dta           4 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 3 ESS4LT.dta           4 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 4 ESS4e04_5.dta        4 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 5 ESS5ATe1_1.d…        5 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 6 ESS5e03_4.dta        5 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
 7 ESS6e02_5.dta        6 idno  cntry dweight pspwght pweight anweight NA    NA      NA   
 8 ESS7SDDFe1_2…        7 idno  cntry NA      NA      NA      NA       prob  stratum psu  
 9 ESS7e02_2.dta        7 idno  cntry dweight pspwght pweight NA       NA    NA      NA   
10 ESS8SDDFe01_…        8 idno  cntry NA      NA      NA      NA       prob  stratum psu  
11 ESS8e02_2.dta        8 idno  cntry dweight pspwght pweight anweight NA    NA      NA   
12 ESS9ROe01.dta        9 idno  cntry dweight pspwght pweight anweight prob  stratum psu  
13 ESS9e03_1.dta        9 idno  cntry dweight pspwght pweight anweight prob  stratum psu  
14 ESS10.dta           10 idno  cntry dweight pspwght pweight anweight prob  stratum psu

briatte commented 1 year ago

Did some more tests, found weird things: https://github.com/gergness/srvyr/issues/157

Best guess, based on weighting guide:

as_survey_design(ids = psu,
                 strata = c(cntry, stratum),
                 nest = TRUE,
                 weights = anweight)

briatte commented 1 year ago

More tests with other designs. Conclusions:

Use psu and stratum for more accurate sampling error estimation
Use anweight for same reason
Using psu + idno is redundant with above
Using nest = TRUE seems optional, but use it just in case

library(srvyr)
library(tidyverse)

ess9 <- readr::read_rds("https://f.briatte.org/temp/ess9_extract.rds")

# Andy Fugard's design
ess9_af1 <- ess9_extract %>%
  as_survey_design(ids = idno, strata = cntry, nest = TRUE,
                   weights = pspwght)
# Fugard, using PSU
ess9_af2 <- ess9_extract %>%
  as_survey_design(ids = psu, strata = cntry, nest = TRUE,
                   weights = pspwght)

# weighting guide + cntry
ess9_wg1 <- ess9_extract %>%
  as_survey_design(ids = psu,
                   strata = c(cntry, stratum), # adding cntry
                   nest = TRUE,
                   weights = anweight)

# weighting guide, no cntry
ess9_wg2 <- ess9_extract %>%
  as_survey_design(ids = psu,
                   strata = stratum, # as recommended
                   nest = TRUE,
                   weights = anweight)

# Vegetti's design -- implicit `ids = idno`
ess9_mv1 <- ess9_extract %>%
  as_survey_design(weights = c(dweight, pspwght))
# Vegetti, using PSU
ess9_mv2 <- ess9_extract %>%
  as_survey_design(ids = psu, weights = c(dweight, pspwght))

# Oberski's design -- implicit `nest = TRUE`
ess9_do <- ess9_extract %>%
  as_survey_design(ids = psu, strata = stratum, weights = prob)

# Stefan Zins' design
# https://github.com/ropensci/essurvey/issues/39#issuecomment-507855290
ess9_sz <- ess9_extract %>%
  as_survey_design(ids = psu, strata = stratum, weights = dweight)

# results -----------------------------------------------------------------

list("AF_idno" = ess9_af1, "AF_psu" = ess9_af2,
     "WG_cntry" = ess9_wg1, "WG_stratum" = ess9_wg2,
     "MV_idno" = ess9_mv1, "MV_psu" = ess9_mv2, "DO_psu" = ess9_do,
     "SZ_psu" = ess9_sz) %>%
  map_dfr(
    ~ .x %>%
      filter(cntry == "GB") %>%
      group_by(wltdffr_group) %>%
      summarise(prop = srvyr::survey_mean(vartype = "se")),
    .id = "design"
  ) %>%
  filter(wltdffr_group == "Fair") %>%
  arrange(-prop_se)

# A tibble: 8 × 4
  design     wltdffr_group  prop prop_se
  <chr>      <fct>         <dbl>   <dbl>
1 MV_psu     Fair          0.200  0.0204
2 MV_idno    Fair          0.200  0.0166
3 WG_cntry   Fair          0.196  0.0128
4 AF_psu     Fair          0.196  0.0128
5 WG_stratum Fair          0.196  0.0125
6 SZ_psu     Fair          0.190  0.0116
7 DO_psu     Fair          0.191  0.0104
8 AF_idno    Fair          0.196  0.0102

briatte commented 1 year ago

Availability of weighting vars:

ESS 9 or 10 have required vars
ESS 7 or 8 require merging with SDDF
ESS 6 has anweight but psu and stratum have to be retrieved from individual SDDFs
ESS 5 and below do not have anweight, so even more work required

… so, use ESS 9 or 10 in examples, or use 7 or 8 for one more example of a merge.

briatte / dsr

Surveys - ESS #33

Weighting guide

Details on analytical weights (ESS9+)

Discussions

Example: Andi Fugard, ESS9

Example: Federico Vegetti, ESS7

Example: Daniel Oberski, ESS7