Leeds-MRG / Minos

SIPHER Microsimulation for estimating the effect on Income policy on mental health.
MIT License
4 stars 3 forks source link

Education state transitions #271

Closed ld-archer closed 1 year ago

ld-archer commented 1 year ago

Currently, a respondents max education state is predicted before the pipeline begins, during production of the stock and replenishing populations (after estimating the education state model). Individuals are then deterministically transitioned to higher education states at specific ages. This is a big assumption in and of itself, but more work to get this closer to reality would most likely have a negligible effect on our population and even smaller on the outcomes.

The issue with this is to handle the move from deterministic transitions to the model based predictions in the S7_labour_state variable. The labour state variable has a factor level for full time education, which needs to work in partnership with the education_state variable and the education module. What we can do is force the labour state variable to FT education until a respondent reaches their max education level, and then move to transitioning their labour state based on the transition model. This will probably mean some people are transitioned back into FT education throughout the model, but hopefully this number will be small enough to be negligible (fewer numbers still in education as age increases in raw data).

Tasks:

ld-archer commented 1 year ago

As usual, Understanding Society is not making this easy...

It appears that it is not unusual for a respondent to bounce around various education levels, and usually from a defined level to 0 and back. e.g.

index pidp time age education_state increased_educ?
88 68010887 2009 45 6 NA
89 68010887 2010 46 0 TRUE
90 68010887 2011 47 0 FALSE
91 68010887 2012 48 6 TRUE
92 68010887 2014 50 0 TRUE
93 68010887 2015 51 0 FALSE
94 68010887 2016 52 5 TRUE
95 68010887 2017 53 6 TRUE

This example is particularly frustrating as it also drops to level 5 and back up to 6 as well as bouncing to 0 and back.

Checking through the variable search, I think there could are a few potential reasons:

  1. People obtaining qualifications through their job that are not on the US list of qualifications (being assigned 0)
  2. The qfhigh_dv variable is derived from all the qualnew variables, which are in the format of: 'did you get a new qual? If yes, which qual was mentioned?' Therefore if someone says yes and then mentions a qual from a lower level, that new qual is registered as their new highest. Basically error prone if there is any ambiguity about whether new qual is higher level than previous highest.
  3. Ambiguity about which level a qualification should be assigned to (i.e. undefined nursing qualification vs nursing degree vs nursing apprenticeship, which could be level 5 (undefined qual) or level 6 (degree)), which could explain some of the up and down?

Despite the reason, one solution is to do some kind of 'maximum interpolation'tm Rob where we carry the maximum qualification forward. i.e. in the example above, the respondent would be assigned a value of 6 throughout. If we had the following series instead:

3,0,0,5,0,6

We would change it to:

3,3,3,5,5,6

ld-archer commented 1 year ago

Education Level Age Check

Sticking some plots in here just for reference.

Age Achieve Education Levels

education_state_1 education_state_2 education_state_3 education_state_5 education_state_6 education_state_7

Mean Age Achieved

mean_age_achieved_educ

ld-archer commented 1 year ago

Note I wrote in the education module about when education transition happen. The TL;DR for this is that some decisions are made based on knowledge of the education system (GCSEs and A-levels are somewhat predictable), and others are made by using the average age they are achieved in Understanding Society data. There is also a bit of a trade off with level 7 qualifications (masters/PhD and equivalent). The average age these are achieved in data is 32, but this would mean individuals are in FT education from 16-32. This is a bit much, as they are probably in employment for a chunk of that time so I've pulled this back to age 30.

NOTE

    # The age that these qualifications are achieved varies quite a lot in Understanding Society data, but trying
    # to include that variance here would add a level of complexity that is probably not worth the hassle (we're
    # assuming that the long term effects of education on health are more important than if an individual achieved
    # e.g. a level 7 qualification later in life).
    # Therefore, we are making a couple of assumptions on when to transition based on either knowledge of the
    # UK education system, or the average age at which these levels are achieved in the underlying data. Note that
    # the age achieved is 1 year after the qualifications are held to account for the range of dates that
    # individuals answer the survey. These are:
    # Level 1 and 2 - Age 17
    #   1 & 2 are equivalent to GCSEs at different grades, and because young people are required to be in full-time
    #   education until 16 it seems reasonable to assume that everyone will achieve this level.
    # Level 3 - Age 19
    #   Level 3 is equivalent to A-levels, advanced apprenticeships, and baccalaureate's. Government guidelines do
    #   require young people to be in FT education, apprenticeship, or combination of volunteering and PT education,
    #   but there is no guarantee that they will achieve these levels so this is not guaranteed like level 1 or 2.
    # Level 5 - Age 30
    #   Nursing or med quals are a weird one, so this is based on the average age achieve in US as there is large
    #   variance in the data.
    # Level 6 - Age 27
    #   Again basing this on US data as it covers first class degrees but also a range of other qualifications.
    # Level 7 - Age 30
    #   This is an unfortunately difficult decision, and is a bit of a balance between the mean age in US data and 
    #   a 'want' to not have respondents in full time education for too long in the model. The mean age this level 
    #   is achieved is 32 in the data, but that would mean these people are in FT education from 16-32 which is 
    #   not realistic. I think these individuals would most likely have some time in employment between these ages
    #   but that would be very complex to implement. 
ld-archer commented 1 year ago

Struggling with something now that might require a bit of a change in approach.

A problem from the start has been that the proportion of the sample in full time education creeps up throughout the model, replacing those in the Not Working category (long-term disabled, retired, probably other categories). This plot illustrates this: non-deterministic_retirement

Initially I wondered if this was due to retirement not being a well-predicted thing, so I made retirement a deterministic thing also with 65 being the retirement age (maybe I should make it 67 now?). After changing this setting so that all 65 and older moved into the Not Working category, this was the result: deterministic_retirement

This has resulted in the Not Working group remaining fairly steady, so not handling retirement was the issue for this group, but now the FT Education group is still trending up in the same way but to the detriment of the FT Employed. I don't have a fully baked solution in mind for this yet, but am going to try a few things and see what happens. One thing I can cross off the list is changing the ages at which people reach certain levels of education. I initially reduced the age to achieve levels 5-7 hoping that would change the outcomes, but it had very little impact on these plots.

The only thing I can think of doing right now that doesn't require scrapping what we have and recreating is to allow people to join the workforce between reaching education levels, and maybe increase the ages of level 5 and 7. Level 5 is nursing / medical qualifications and the mean age of achieving these qualifications is around 31, perhaps we could assume these people do not stay in full time education for that length, but instead get these qualifications whilst in the workforce.

ld-archer commented 1 year ago

Going to close this as I'm trying something different to address these issues, outlined in #293