Add a cross-sectional analysis weight variable from Understanding Society (or generate our own, see Q14/Q15 in weighting faq)

ld-archer commented 2 years ago

Making this an issue in and of itself just in case we need to generate our own analysis weights. See weighting faqs for more information, or a video on weighting guidance.

ld-archer commented 2 years ago

Taken from the weighting faq's document:

6. Which weight should I use for my analysis?

There are a number of weights reflecting the complex structure of the data. The weight name has the following structure: w_xxxyyzz_aa. To select a weight please answer the following questions:

_aa part: Is your analysis longitudinal or cross-sectional?
- Longitudinal _lw
- Cross-sectional _xw
w_ part:
- if your analysis is cross-sectional – which wave you are using? e.g. wave 8: h_
- if your analysis is longitudinal – which is the last wave in your analysis? e.g. you are looking at wave 1-9: i_

Wave: 1 2 3 4 5 6 7 8 9 10 11 Prefix: A b c d e f g h i j k

xxxyy part: Is your analysis household level or individual level?
- If it is household level: _hhden
- If it is individual level see below
xxxyy part: Is your analysis for all persons aged 0+, for youth (10-15) or for adults (16+)?
- 0+ population: _psnen
- Youth (10-15): _ythsc
- Adults (16+): see below
xxxyy part: you are studying adults aged 16+. Where does your data come from?
- Just one survey instrument (e.g. individual questionnaire): use the weight indicated on the appropriate row of the table below
- A combination of instruments: use the weights from the lowest level in the table below.

sample_weights_suffixes

For example, if you are using information from the household grid and self- completion questionnaire, the levels are respectively 5 and 2 with 2 being lower – hence the weight will be for self-completion data (_indsc ). Similarly, if you are combining information from household grid, adult main interview and nurse visit, your lowest level is 2 so the weight will be _indns.

There will be situations when you combine information from different instruments at the same level: an example would be adult self-completion interview and nurse visit. In this situation we do not have an optimal weight for you and you could use either a suboptimal weight (see Q14) or you can create a weight adjustment tailored to your analysis (see Q15).

zz_ part: what is the timeline of your research?
- Starting at wave 6 (2014-15) onwards: ui_
- Starting between wave 2 (2010-11) and wave 5 (2013-14): ub_
- Starting at wave 1 (2009-10): us_
- Starting at any point between 2001 and 2008: 01_
- Starting at any point between 1991 and 2000: 91_

Translated

Because we are combining data from

We have 2 variables from the self-completion sample at present (scghqi: depression_change, and sclonely: loneliness). These are not key variables but will most likely be used, so for a first attempt I'm using the self-completion weight variable from the table in question 5.

After going through the weighting faqs doc I think we need to use the cross-sectional weights, as we are only ever estimating models with a single wave of data (somebody correct me if that's wrong!). Also using the longitudinal weight would reduce our effective sample down to 30-50% of the original size (including 36% of people with at least 1 wave of positive weight), whereas the cross-sectional ranges from 70-80% (including 66% of people).

The weight variable is indscub_xw.

Using the Weight

Need to account for the weights in all the transition estimation functions.

[x] income_ols
[x] SF-12_ols
[x] labour_nnet
[x] housing_clm

ld-archer commented 2 years ago

Analysis weight added and included in the transition models now in main from this commit.

Leaving this issue open in case the analysis weight is not suitable, and we have to generate a custom one (see Q15 in FAQ).

Leeds-MRG / Minos