Closed ld-archer closed 1 year ago
This is the distribution of the new composite. Much more balanced, where the most common group is level 2 (all core some bonus), and both other groups have a decent chunk of people.
Now worth testing a transition model for the relationship between hh_income and the new composite:
> summary(housing)
formula: housing_quality ~ scale(hh_income)
data: data
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 44407783 -38980356.07 77960718.14 6(0) 1.75e-08 3.2e+00
Coefficients:
Estimate Std. Error z value Pr(>|z|)
scale(hh_income) 0.5195491 0.0004357 1193 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -2.4353675 0.0005439 -4477
2|3 0.6576626 0.0003217 2044
(255 observations deleted due to missingness)
> summary(housing)
formula: housing_quality ~ scale(age) + factor(sex) + scale(SF_12) + relevel(factor(ethnicity), ref = "WBI") + scale(hh_income)
data: data
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 44407783 -38385771.83 76771577.67 6(0) 9.21e-08 1.8e+03
Coefficients:
Estimate Std. Error z value Pr(>|z|)
scale(age) -0.2072656 0.0003146 -658.81 <2e-16 ***
factor(sex)Male 0.0399841 0.0006028 66.33 <2e-16 ***
scale(SF_12) 0.1902093 0.0004009 474.44 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")BAN -1.3238875 0.0040016 -330.84 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")BLA -1.1783021 0.0025514 -461.82 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")BLC -1.0926503 0.0036256 -301.37 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")CHI -0.9654861 0.0047377 -203.79 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")IND -0.4802789 0.0019294 -248.93 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")MIX -0.2987210 0.0023905 -124.96 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")OAS -1.1041091 0.0028327 -389.78 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")OBL -1.1764196 0.0115136 -102.18 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")OTH -0.2691478 0.0048285 -55.74 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")PAK -0.8678519 0.0024983 -347.38 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")WHO -0.1683221 0.0012785 -131.65 <2e-16 ***
scale(hh_income) 0.4999786 0.0004391 1138.72 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -2.5228946 0.0006476 -3896
2|3 0.6460611 0.0004562 1416
(255 observations deleted due to missingness)
Smoking and drinking intensity have been removed from this model.
Marital status and household composition (see below) have been added as predictors.
We were also asked to justify ethnicity as a predictor of loneliness, which I believe this paper does (Franssen et al (2020).
The association between ethnicity and loneliness was stronger among young and early middle-aged adults, compared to late middle-aged adults.
This wasn't the only paper I found that mentioned a link, we can get more if necessary.
A representation of household composition has been added to use as a predictor of loneliness. To generate this variable, we reduced and simplified a household composition variable from Understanding Society (hhtype_dv). The original variable had 18 levels, we reduced this to 4 levels:
Counts by group:
3 15119
4 9589
1 4959
2 785
The 9 level marstat variable from Understanding Society has 9 levels, but some covering less than 1% of the sample. We have recoded the variable into 4 levels:
Partnered 19597
Single 6420
Separated 2497
Widowed 1859
-9 79
-9 is missing
formula:
loneliness ~ scale(age) + factor(sex) + scale(SF_12) + relevel(factor(education_state), ref = "3") + relevel(factor(job_sec), ref = "3") + scale(hh_income) + relevel(factor(hh_comp), ref = "3") + relevel(factor(marital_status), ref = "Partnered")
data: data
Coefficients:
Estimate Std. Error z value Pr(>|z|)
scale(age) -9.262e-02 7.455e-04 -124.239 < 2e-16 ***
factor(sex)Male -2.076e-01 8.781e-04 -236.396 < 2e-16 ***
scale(SF_12) -8.742e-01 4.711e-04 -1855.889 < 2e-16 ***
relevel(factor(education_state), ref = "3")0 -2.334e-02 1.564e-03 -14.921 < 2e-16 ***
relevel(factor(education_state), ref = "3")1 -5.810e-02 5.157e-03 -11.267 < 2e-16 ***
relevel(factor(education_state), ref = "3")2 -1.098e-01 1.539e-03 -71.307 < 2e-16 ***
relevel(factor(education_state), ref = "3")5 -1.463e-02 1.856e-03 -7.882 3.21e-15 ***
relevel(factor(education_state), ref = "3")6 -1.198e-02 1.579e-03 -7.585 3.33e-14 ***
relevel(factor(education_state), ref = "3")7 -1.334e-02 1.733e-03 -7.700 1.36e-14 ***
relevel(factor(job_sec), ref = "3")1 -1.642e-01 2.215e-03 -74.131 < 2e-16 ***
relevel(factor(job_sec), ref = "3")2 -7.349e-06 1.634e-03 -0.004 0.996
relevel(factor(job_sec), ref = "3")4 1.170e-01 1.356e-03 86.289 < 2e-16 ***
relevel(factor(job_sec), ref = "3")5 1.994e-01 1.636e-03 121.906 < 2e-16 ***
relevel(factor(job_sec), ref = "3")6 1.000e-01 1.823e-03 54.863 < 2e-16 ***
relevel(factor(job_sec), ref = "3")7 2.575e-01 1.323e-03 194.686 < 2e-16 ***
relevel(factor(job_sec), ref = "3")8 3.983e-01 1.658e-03 240.209 < 2e-16 ***
scale(hh_income) -3.572e-02 4.422e-04 -80.782 < 2e-16 ***
relevel(factor(hh_comp), ref = "3")1 4.517e-01 1.385e-03 326.003 < 2e-16 ***
relevel(factor(hh_comp), ref = "3")2 4.652e-01 2.507e-03 185.550 < 2e-16 ***
relevel(factor(hh_comp), ref = "3")4 -5.316e-02 1.011e-03 -52.608 < 2e-16 ***
relevel(factor(marital_status), ref = "Partnered")Separated 5.609e-01 1.665e-03 336.957 < 2e-16 ***
relevel(factor(marital_status), ref = "Partnered")Single 4.877e-01 1.210e-03 403.041 < 2e-16 ***
relevel(factor(marital_status), ref = "Partnered")Widowed 7.780e-01 3.912e-03 198.876 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Threshold coefficients:
Estimate Std. Error z value
1|2 0.793914 0.001685 471.1
2|3 3.233672 0.001842 1755.6
(10207 observations deleted due to missingness)
Reference factors: education_state -> 3 - A-level or equivalent job_sec -> 3 - Lower management and professional hh_comp -> 3 - Multiple adults no kids marital_status -> Partnered
Looks about right. Widowed and Separated show more loneliness than Single, and Married show lowest. Single adult households more lonely than multiple adults, but adults with kids show lowest. Moving up the list of job_sec (will paste breakdown below) show increased loneliness. Age and Gender also show relationship we would expect.
Values for job_sec:
1 - Large employers and higher management
2 - Higher professional
3 - Lower management and professional
4 - Intermediate
5 - Small employers and own account
6 - Lower supervisory and technical
7 - Semi-routine
8 - Routine
formula: loneliness ~ scale(hh_income)
data: data
Coefficients:
Estimate Std. Error z value Pr(>|z|)
scale(hh_income) -0.2025969 0.0003893 -520.4 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Threshold coefficients:
Estimate Std. Error z value
1|2 0.3863648 0.0003101 1246
2|3 2.3916726 0.0005460 4380
(972 observations deleted due to missingness)
Justification for using ethnicity to predict smoking: Bhopal et al.. This review found differences in cigarette consumption between many ethnic minority groups.
Justification for job_sec (NSSEC) to predict smoking: Hiscock et al. (2014). Paper links socioeconomic status and smoking in England. NS-SEC was 1 of 7 measures of SES used to classify individuals (routine or manual occupation was +1 in an SES score from 0-7).
Also Aspinall and Mitton (2014)
There was a clear social class (NS-SEC) gradient in smoking prevalence for ‘White British’ and ‘Other White’ males and females (Fig. 1). There was also a gradient, less regular partly because of small numbers, in the ‘White and Black Caribbean’ and ‘White and Black African’ groups. This was much less perceptible in the ‘White and Asian’ group. In the Indian, Pakistani and Bangladeshi groups, the gradient was either much more muted or entirely absent. There was some evidence of a gradient in the black groups but a much stronger gradient in the Chinese group, commensurate with that seen in the White groups.
To generate a measure of physical health, we will use questions from the SF-12 questionnaire that only relate to physical health. These are:
scsf2a
: physical health limits moderate activitiesscsf2b
: physical health limits several flights of stairsscsf3a
: physical health limits amount of workscsf3b
: physical health limits kind of workscsf5
: pain interfered with workSome of these variables are scored in the same way - a 5 level scale where 1 is worst and 5 is best. e.g. Physical health limits amount of work:
Some are scored slightly differently but in the same 'direction' for want of a better word. e.g. Health limits several flights of stairs:
However, one question is the opposite direction (1 - best, 5 - worst). Pain interfered with work:
These answers will need to be flipped.
We can create a continuous variable for physical health from these answers, where lower values equal better physical health. This will then be included in the SF-12 MCS model.
The physical health score phealth
is a mean summary score of these 5 questions, which ranges from 1-5 (including -9 as missing).
There are 1146/~32000 missing in wave 11.
Call:
lm(formula = formula, data = data, weights = weight)
Weighted Residuals:
Min 1Q Median 3Q Max
-3.974e-11 -1.500e-13 0.000e+00 9.000e-14 9.306e-10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.771e+01 4.542e-15 1.050e+16 < 2e-16 ***
scale(age) 1.311e-14 2.645e-15 4.957e+00 7.28e-07 ***
factor(sex)Male -6.846e-15 3.797e-15 -1.803e+00 0.07144 .
relevel(factor(ethnicity), ref = "WBI")BAN -9.422e-15 2.471e-14 -3.810e-01 0.70295
relevel(factor(ethnicity), ref = "WBI")BLA 9.292e-15 1.470e-14 6.320e-01 0.52725
relevel(factor(ethnicity), ref = "WBI")BLC 8.376e-15 2.088e-14 4.010e-01 0.68836
relevel(factor(ethnicity), ref = "WBI")CHI -1.856e-14 2.515e-14 -7.380e-01 0.46065
relevel(factor(ethnicity), ref = "WBI")IND -1.062e-15 1.114e-14 -9.500e-02 0.92410
relevel(factor(ethnicity), ref = "WBI")MIX -7.133e-15 1.354e-14 -5.270e-01 0.59826
relevel(factor(ethnicity), ref = "WBI")OAS 1.916e-14 1.596e-14 1.200e+00 0.22998
relevel(factor(ethnicity), ref = "WBI")OBL 7.465e-15 6.621e-14 1.130e-01 0.91022
relevel(factor(ethnicity), ref = "WBI")OTH -1.479e-14 2.809e-14 -5.270e-01 0.59849
relevel(factor(ethnicity), ref = "WBI")PAK -2.121e-15 1.672e-14 -1.270e-01 0.89903
relevel(factor(ethnicity), ref = "WBI")WHO 9.181e-16 7.194e-15 1.280e-01 0.89845
scale(hh_income) 4.243e-15 1.809e-15 2.346e+00 0.01900 *
scale(SF_12) 1.067e+01 1.983e-15 5.381e+15 < 2e-16 ***
relevel(factor(housing_quality), ref = "1")0 -4.214e-15 5.780e-14 -7.300e-02 0.94189
relevel(factor(housing_quality), ref = "1")2 -2.673e-15 3.937e-15 -6.790e-01 0.49724
relevel(factor(housing_quality), ref = "1")3 -3.277e-15 7.031e-15 -4.660e-01 0.64119
relevel(factor(job_sec), ref = "3")1 2.672e-14 9.182e-15 2.910e+00 0.00363 **
relevel(factor(job_sec), ref = "3")2 1.245e-16 6.940e-15 1.800e-02 0.98569
relevel(factor(job_sec), ref = "3")4 1.950e-16 5.902e-15 3.300e-02 0.97364
relevel(factor(job_sec), ref = "3")5 7.644e-16 6.994e-15 1.090e-01 0.91296
relevel(factor(job_sec), ref = "3")6 6.960e-16 7.696e-15 9.000e-02 0.92794
relevel(factor(job_sec), ref = "3")7 -7.968e-16 5.638e-15 -1.410e-01 0.88761
relevel(factor(job_sec), ref = "3")8 7.466e-16 7.054e-15 1.060e-01 0.91571
scale(phealth) -2.990e-15 2.572e-15 -1.162e+00 0.24514
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.407e-12 on 9901 degrees of freedom
(10118 observations deleted due to missingness)
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 1.264e+30 on 26 and 9901 DF, p-value: < 2.2e-16
Without lagged SF-12
Call:
lm(formula = formula, data = data, weights = weight)
Weighted Residuals:
Min 1Q Median 3Q Max
-4859.3 -108.0 0.0 238.7 2620.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.84309 0.24545 190.847 < 2e-16 ***
scale(age) 3.25027 0.13927 23.337 < 2e-16 ***
factor(sex)Male 1.87405 0.20448 9.165 < 2e-16 ***
relevel(factor(ethnicity), ref = "WBI")BAN -1.61794 1.33596 -1.211 0.225896
relevel(factor(ethnicity), ref = "WBI")BLA 2.18292 0.79442 2.748 0.006010 **
relevel(factor(ethnicity), ref = "WBI")BLC 1.16233 1.12920 1.029 0.303342
relevel(factor(ethnicity), ref = "WBI")CHI -1.83940 1.36013 -1.352 0.176287
relevel(factor(ethnicity), ref = "WBI")IND 2.22519 0.60225 3.695 0.000221 ***
relevel(factor(ethnicity), ref = "WBI")MIX 0.23853 0.73206 0.326 0.744558
relevel(factor(ethnicity), ref = "WBI")OAS 3.57307 0.86244 4.143 3.46e-05 ***
relevel(factor(ethnicity), ref = "WBI")OBL 8.23099 3.57929 2.300 0.021491 *
relevel(factor(ethnicity), ref = "WBI")OTH 0.28568 1.51909 0.188 0.850831
relevel(factor(ethnicity), ref = "WBI")PAK 2.22223 0.90370 2.459 0.013948 *
relevel(factor(ethnicity), ref = "WBI")WHO 2.14112 0.38842 5.512 3.63e-08 ***
scale(hh_income) 0.32148 0.09776 3.288 0.001011 **
relevel(factor(housing_quality), ref = "1")0 4.97712 3.12529 1.593 0.111298
relevel(factor(housing_quality), ref = "1")2 -0.60156 0.21281 -2.827 0.004711 **
relevel(factor(housing_quality), ref = "1")3 -3.26182 0.37880 -8.611 < 2e-16 ***
relevel(factor(job_sec), ref = "3")1 0.53102 0.49652 1.069 0.284872
relevel(factor(job_sec), ref = "3")2 -1.30004 0.37505 -3.466 0.000530 ***
relevel(factor(job_sec), ref = "3")4 -0.37125 0.31917 -1.163 0.244781
relevel(factor(job_sec), ref = "3")5 -0.16691 0.37818 -0.441 0.658974
relevel(factor(job_sec), ref = "3")6 1.36673 0.41596 3.286 0.001021 **
relevel(factor(job_sec), ref = "3")7 0.63178 0.30480 2.073 0.038220 *
relevel(factor(job_sec), ref = "3")8 0.51866 0.38140 1.360 0.173901
scale(phealth) 3.06351 0.13564 22.585 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 508.7 on 9902 degrees of freedom
(10118 observations deleted due to missingness)
Multiple R-squared: 0.1192, Adjusted R-squared: 0.117
F-statistic: 53.62 on 25 and 9902 DF, p-value: < 2.2e-16
When lagged SF-12 is included in the prediction of next SF-12, we see that it dominates the model and is pretty much the sole predictor of next state. When we remove it however, many more variables become significant and even the signs change in some cases. This is something to talk about.
Refactored the composite to be more balanced, see below for details:
For each of the seven crime variables, combine ‘very common’ and ‘fairly common’ to create a composite ‘fairly or very common’ (these are the small number categories). Then bin responses like this:
- Response to all crime questions is “not at all common” (very safe neighbourhood). Justification is that if you perceive no threat at all this is the best possible state.
- Responds to 1+ question as “not very common” but no responses to ‘fairly or very common’ (safe neighbourhood). Justification is that on the whole these people probably feel safe but not all is perfect, so probably not quite as desirable as group 1.
- Responds to 1+ question as fairly or very common’ (not safe). Justification is that if perception of crime is very or fairly common, no matter what category, you are likely to feel that your neighbourhood safety is compromised.
crime_var_list = ['burglaries', 'car_crime', 'drunks', 'muggings', 'racial_abuse','teenagers', 'vandalism']
formula:
neighbourhood_safety ~ scale(age) + factor(sex) + relevel(factor(job_sec), ref = "3") + relevel(factor(ethnicity), ref = "WBI") + scale(hh_income) + relevel(factor(housing_quality), ref = "3") + relevel(factor(region), ref = "South East")
data: data
Coefficients:
Estimate Std. Error z value Pr(>|z|)
scale(age) 0.1911170 0.0005415 352.918 <2e-16 ***
factor(sex)Male 0.0992345 0.0008171 121.443 <2e-16 ***
relevel(factor(job_sec), ref = "3")1 0.0907836 0.0019143 47.425 <2e-16 ***
relevel(factor(job_sec), ref = "3")2 -0.0183678 0.0014909 -12.320 <2e-16 ***
relevel(factor(job_sec), ref = "3")4 -0.1669641 0.0013137 -127.095 <2e-16 ***
relevel(factor(job_sec), ref = "3")5 -0.0022353 0.0014955 -1.495 0.135
relevel(factor(job_sec), ref = "3")6 -0.2333180 0.0017010 -137.169 <2e-16 ***
relevel(factor(job_sec), ref = "3")7 -0.1715113 0.0012251 -139.996 <2e-16 ***
relevel(factor(job_sec), ref = "3")8 -0.3576329 0.0015141 -236.198 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")BAN -0.2870159 0.0058240 -49.281 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")BLA 0.4235247 0.0038038 111.343 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")BLC 0.1018040 0.0046766 21.769 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")CHI -0.1630809 0.0051622 -31.591 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")IND 0.1195040 0.0026874 44.468 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")MIX -0.0850860 0.0034566 -24.616 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")OAS 0.2912039 0.0037542 77.568 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")OBL -0.8949582 0.0154625 -57.879 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")OTH 0.4904773 0.0053127 92.322 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")PAK -0.2786344 0.0038326 -72.701 <2e-16 ***
relevel(factor(ethnicity), ref = "WBI")WHO 0.1278501 0.0017688 72.282 <2e-16 ***
scale(hh_income) 0.0574468 0.0004580 125.424 <2e-16 ***
relevel(factor(housing_quality), ref = "3")0 -1.3560440 0.0139548 -97.174 <2e-16 ***
relevel(factor(housing_quality), ref = "3")1 0.3235484 0.0016804 192.543 <2e-16 ***
relevel(factor(housing_quality), ref = "3")2 0.1504321 0.0016288 92.360 <2e-16 ***
relevel(factor(region), ref = "South East")East Midlands -0.1127611 0.0017424 -64.715 <2e-16 ***
relevel(factor(region), ref = "South East")East of England -0.0833717 0.0016068 -51.887 <2e-16 ***
relevel(factor(region), ref = "South East")London -0.9388216 0.0015856 -592.098 <2e-16 ***
relevel(factor(region), ref = "South East")North East -0.2331798 0.0022402 -104.087 <2e-16 ***
relevel(factor(region), ref = "South East")North West -0.3220310 0.0015884 -202.739 <2e-16 ***
relevel(factor(region), ref = "South East")Northern Ireland 1.2223204 0.0031364 389.717 <2e-16 ***
relevel(factor(region), ref = "South East")Scotland 0.1814973 0.0018498 98.119 <2e-16 ***
relevel(factor(region), ref = "South East")South West 0.3125546 0.0016802 186.023 <2e-16 ***
relevel(factor(region), ref = "South East")Wales 0.3207328 0.0023706 135.295 <2e-16 ***
relevel(factor(region), ref = "South East")West Midlands -0.2283518 0.0017088 -133.630 <2e-16 ***
relevel(factor(region), ref = "South East")Yorkshire and The Humber -0.2783216 0.0016944 -164.264 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -0.836559 0.001999 -418.5
2|3 1.437975 0.002016 713.2
(9429 observations deleted due to missingness)
Counts of new composite.
looks good. happy to integrate this into new docs. can import Rmd directly with refs/figures. want to dicuss?
Closing as completed on branch 84, then brought into branch 113 with lots of other changes. Development branch was then created directly from branch 113, so it has been in development from the start.
Opening this issue to track the decisions and changes made in improving transition models.
hh_income
Following a meeting with SIPHER members (9/12/21), we had a couple of ways to improve the hh_income model. Following these suggestions, we have made some changes to the model that are pragmatic but could be improved upon in the future.
Lag hh_income The lag of hh_income has been added to the model. There is debate as to whether we should use the lag of something to predict its next state, but for now it is fine. This could be replaced however with something like the Random Effects (2.2) or Fixed Effects (2.3) from this document.
Static job_sec and job_sector These 2 variables are held static, and not transitioned over time. This decision was made to reduce complexity, as it means we do not need to predict the next state of each of these things before predicting next state of hh_income. If we could predict these variables effectively however, we would most likely get a more robust/'accurate' predictive model (accurate is probably the wrong word here).
Labour state Labour state is no longer transitioned, and therefore is not included in the hh_income model. Again this would most likely improve the hh_income model, but also adds complexity. If necessary in the future, we can either transition this variable separately from job_sec or we could combine the two, adding additional levels to the 8 level NSSEC (retired, student, unemployed etc.)
Household information At present, no information on other members of the household is included in the prediction of hh_income. This could improve things, but again adds a lot of complexity (more than the previous 3 points). We would have to figure out how to transition households into the future, as well as predict all the things we want to include in these models. Lets avoid that for now.
alcohol_spending
This model has been removed, as the impact of alcohol spending (or consumption for that matter) on mental health wasn't clear enough.
housing_quality
The previous version of the composite had three levels:
This was a problem as the composite was heavily skewed - almost nobody had no access to the components, and movement between the levels was very unbalanced (i.e. moving from 3 to 2 had a much bigger impact than moving from 2 to 1). Instead we have identified a core set of components that are important for housing quality, and a 'bonus set' for want of a better term. The core set are:
Which leaves for the bonus set:
The new composite will have 5 levels depending on access to core/bonus variables:
Will test the distribution of this composite after generating and post results here.