Removing parity from fertility model + refactoring

paddy-r commented 9 months ago

Required for fertility model write-up, i.e. to see effect of including parity in model. Want to have option of removing parity completely so we can compare runs with and without parity. Coding jobs from my email, 25/08/23:

[x] Correct range of years in fertility rate table construction to exclude 2061, as data are all "NA". -> https://github.com/Leeds-MRG/Minos/pull/439/commits/fcd527fcb78ade1fa5c9748dfcc8dd0ff9e72aa1
[x] Remove mortality and fertility rate table spec in config file entirely, as pointless. Define within those modules instead. -> done in 05f4a4b
[x] Allow rate table computation/caching outside runtime, for testing; method can then also be used at runtime when necessary, but gives more flexibility. Probably best done by moving everything into convert_rate_data. -> done in c08fc97 but could be simplified even further
[x] Following on from the above point, automate year selection (i.e. intersection of years in ONS data and NewEthPop data, happens to be 2011-2020). To clarify purpose of this: cache intermediate (NewEthPop-based) data for full range of years (2011-2061), then compute rate table with/without parity (at runtime, if necessary) consisting of only years required for simulation. Means different rate tables for different run years, but rate tables may be much smaller. -> done in 642b75b
[x] ~~Think about adding try-except block to BaseHandler.cache for case of rate_table_path not being defined.~~ -> marking as done as unnecessary
[x] Add functionality to Minos to allow runs with or without parity in fertility model. Mainly a question of tracking back through old code and adding optional argument somewhere. -> done in cc1c5bc, but think about making entire rate table computation and caching more elegant.
[x] Speed up and streamline rate table calculation, stage 1: overhaul transform_rate_table to be more Pandas-based, as quicker. -> done in f06d495
[x] ~Stage 2 of above: overhaul parity calculation, i.e. FertilityRateTable.add_parity, as could be less Python and more Pandas; currently takes a minute or two, should be faster. (Mentioned by Rob in comments below.)~ crossing this off as not a priority and rate table production should only ever be done once, then cached
[x] ~Enhance existing R/Jupyter notebooks to visualise effects of the above (already exists for various variables, e.g. nkids_ind) with specific comparison of with/without parity.~ wiping as vague, purpose not clear

~~Also would ideally like to get resolved #275 and #291 along the way.~~ -> both done.

Update June 2024.

[x] ~~Transition from ONS data (1934-2020, England and Wales only) to Human Fertility Database (all UK, various year ranges depending on specific data but generally up to 2020; see here).~~ -> ignoring for now as would have negligible effect on results
[x] ...or update to 1934-2022 ONS data (available here since March 1934). -> done in feb9bef
[x] ~Reinstate metrics module to include fertility metrics (originally contained child poverty metrics)~ nope, just run fertility metrics in post-processing

metrics

Update 09/10/24.

Consolidating outstanding jobs here, and crossing off some non-priority jobs.

[ ] Switch to nnewborn for new births (currently preg / nkids_ind_new) as (a) imputed and very low missingness, (b) more flexible as includes n > 1 new children, and (b) already used within metrics elsewhere, to be pulled into this branch.
[ ] Replace has_newborn with nnewborn everywhere by way of harmonisation (i.e. same variable in DG and at runtime).
[ ] Add basic fertility metrics (and mortality metric): mortality, total fertility rate (TFR), general fertility rate (GFR), crude birth rate (CBR) and reference data; all in notebooks ready to be copied over.
[ ] Add runtime and post-processing functionality to analyse particular groups (e.g. age brackets, ethnicity).
[ ] Add functionality for (e.g.) cohort-based fertility rates and average birth spacing for validation with (e.g.) Frejka and Sardon (2007).
[ ] Allow for particular fertility metric to be matched to reference data at first year of simulation; this avoids initial-value problems. Could use general fertility rate (GFR), crude birth rate (CBR) or total fertility rate (TFR), as in figure below from pared-down microsim (not Minos).

RobertClay commented 9 months ago

just to butt in this should be doable from the config file by selecting the choice of input rate table data for fertility. I've done this before coarsely for switching between rate tables for mortality at the LSOA/LA level.

another side point may be to check the caching works as its been slow for me lately (may just be parity adds a lot of rows though)

another possile speed improvement is to drop the interpolation and stick with integer age rates since we have a yearly cross section model anyway.

paddy-r commented 9 months ago

@RobertClay all good, thanks, and will discuss more when I've made some progress. Also should roll in some points from #275, meant to get most of that done by now.

Leeds-MRG / Minos

Removing parity from fertility model + refactoring #369