Leeds-MRG / Minos

SIPHER Microsimulation for estimating the effect on Income policy on mental health.
MIT License
4 stars 3 forks source link

SF_12 PCS Pathways #239

Closed ld-archer closed 9 months ago

ld-archer commented 1 year ago

Need some additional modules and some modifications to others:

ok my takeaway for the income -> PCS pathways:

Steps:

NOTE: We don't need variables that span the entire length of data anymore, as we are currently only using the 2017-2018 model for SF_12_MCS. Therefore the only absolute thing we need is data for any variables that make up a pathway in 2017, as we need to be able to fit a one yearly model for the pathway variables (i.e. 2016-2017) and a one yearly model for SF_12_PCS (2017-2018).

WORKING GOOGLE DOC

ld-archer commented 1 year ago

Some work to finish off in #231 before this can be completely finished (cross-validation for VALIDATION point, and some fixes for the outcome visualisation of PCS), but the vast majority of this can be started at any point.

ld-archer commented 1 year ago

Discussion Points

Early indications that there will be lots to discuss from this round of data discovery, so I'm listing them all here and will try to arrange a meeting most likely next week.

ld-archer commented 1 year ago

Modules

After discussion on 8/6/23, these are the modules we have settled on as a good mix of important and achievable.

Material Deprivation

Data Discovery

Should be a simple proxy to create, we have a couple of 4 level ordinal variables asked to every household with either no pensioners or pensioners and children. Can just take the mean of these variables for a material deprivation composite. This should be related to hh_income (as questions are all '{do you have} enough money to i.e. keep your house in a decent state of repair?).

Exercise / Fitness

Data Discovery

Can use government guidelines to determine what is a healthy level of exercise vs unhealthy, then create a binary variable. Guidelines state that health activity level is at least 150 minutes of moderate intensity exercise, or 75 minutes vigorous intensity.

Job Satisfaction

Data Discovery

Can just use the jbsat variable for this, although in the data discovery we have a large amount of related information that could be used to expand this in the areas of hours worked, working arrangements (part-time, on-call, work from home etc.), autonomy of work, psychological job stress (feels uneasy about job, feels depressed, miserable etc.).

Alcohol Use

Have AUDIT-PC scores that we can use to create a 3 level ordinal:

Which should make quite a simple module.

Chronic Disease

Some work to do here before this can go into a module.

  1. Fit a regression model for all chronic diseases to SF-12 (MCS & PCS)?
    • See if there is any obvious way of binning diseases, or if some can be safely ignored (i.e. small coefficient and insignificant)
  2. Fit regression model for number of chronic diseases to SF-12.
    • i.e. 0 vs 1 vs 2 vs 3 vs 3+
    • Is a number of chronic diseases more useful than tiers? Is it easier to predict?
paddy-r commented 1 year ago

May be way too complicated an approach, but for quantitative measure of severity of chronic disease, how about years of life lost (YLL)? Quick search found this for Germany, see Figure 2, it's also age-specific. So for a given age group, (relative) severity of disease is proportion of YLL.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8212398/

ld-archer commented 1 year ago

Looks like a good place to start and could be used to justify any decisions we make, cheers!

paddy-r commented 1 year ago

Looks like a good place to start and could be used to justify any decisions we make, cheers!

Looks like data for England and Wales are available in massive detail...

https://digital.nhs.uk/data-and-information/publications/statistical/compendium-mortality/current/years-of-life-lost

Of the US variables you found, it looks like most are present one-to-one, e.g. asthma, coronary heart disease, others present but not one-to-one, e.g. emphysema comes under "bronchitis and emphysema".

ld-archer commented 1 year ago

Alcohol

_Docstring from generate_composite_vars.calculate_auditcscore():

Alcohol use disorders can be assessed via the AUDITC score. This score is derived from 3 questions that form part of the full 10 question AUDIT screening test, where AUDITC specifically focuses on consumption. The 3 questions are:

  1. How often do you have a drink containing alcohol?
  2. How many units of alcohol do you drink on a typical day when you are drinking?
  3. How often have you had 6 or more units if female, or 8 or more if male, on a single occasion in the last year?

Each question is ordinal with 5 levels, depending on the 'severity' of the answer. We then score each question from 0-4, with higher scores meaning higher 'severity'. The total across the 3 questions then creates a score from 0-12, with 0-4 meaning sensible drinking, 5-7 meaning hazardous drinking, and 8+ meaning harmful drinking. See following link for information on scoring: https://www.drinktalkingportal.co.uk/clinical-guidance/alcohol-abuse-screening/alcohol-audit-audit-c

To calculate this score, ee rely on 4 variables in Understanding Society shown at the following link: https://www.understandingsociety.ac.uk/documentation/mainstage/dataset-documentation?search_api_views_fulltext=auditc

Question 1 above relies on auditc1 & auditc3, question 2 relies on auditc4, and question 3 uses auditc5.

NOTE: The final variable used (auditc5) specifically mentions 6 or more drink frequency, rather than 6+/8+ units. This could be a mistake in the description or the actual question asked being incorrect (not the true AUDITC question). There's no information about which one it is, so I'm treating it the same as the AUDITC3 question for our purposes. Added benefit that this is simpler to code without checking for gender also.

Sample Check

image image

Interesting here that the number of missing values jumps in 2020. Assuming this has something to do with COVID? Maybe people were less happy to talk about their consumption during COVID lockdowns? Unfortunately due to the lack of information on the website we don't have any idea why... There is some literature using these variables though so I'll have a look through that also.

Handovers

image image

Cross-Validation

[EDIT] Forgot to copy 2015 data onto 2014 as no alcohol data in 2014. image

ld-archer commented 1 year ago

Current PCS Plots

This is with the following variables as predictors of PCS:

Arguments to be made I think over loneliness and financial situation, will take some time before merging any of this work to get literature backing for any decisions we make.

Handovers

image image

Cross-Validation

image image image image image

ld-archer commented 9 months ago

Happy with pathways at the mo (few issues to resolve still but we have a working model) so closing this now.