Leeds-MRG / Minos

SIPHER Microsimulation for estimating the effect on Income policy on mental health.
MIT License
4 stars 3 forks source link

Add more years to fertility from NewEthPop #211

Closed paddy-r closed 1 year ago

paddy-r commented 1 year ago

Issues to discuss/clarify with Rob and Luke during development, to be converted to jobs if agreed:

(1) Clarify difference b/t key_columns and parameter_columns in add_new_birth_cohorts.setup -> DONE (see Rob's comment below) (2) Which is current, FertilityAgeSpecificRates or nkidsFertilityAgeSpecificRates? (Presumably the latter.) -> DONE (the latter) (3) Currently data_generation.convert_rate_data, which generates rate table file, not called anywhere. How about calling during installation/setup to ensure regional output files present? Or during fertility module initialisation? -> DONE as moved to job below (4) Should mean be weighted (by population? NewEthPop data exist) in collapse_LAD_to_region (also, collapse_location)? -> DONE, as converted to job below (5) Are LA and region definitions current? If not, can use code from Inclusive Economy to generate lookups -> DONE as moved to new issue (#219), could be useful but not a priority for now as currently aggregated into region anyway (6) Why LAs used in BaseHandler.compute_migration_rates (presumably for migration modules?) but regions used in fertility module? -> DONE, as answered by Rob below (7) Think about how to generalise output/logging functionality in RunPipeline (and already a comment about it there) as very useful for me during fertility module development (cf. #167) -> already partly addressed in job below -> create new issue if good idea to add more detailed functionality -> marking as DONE as vague and not priority; also at least partly covered by job below (8) How/where to generate/view specific variables during simulation, in the first instance fertility rate and year for which data is sought and year for which data are available? -> marking as DONE as (1) vague, (2) will become clearer over time and (3) at least partly covered by jobs below (9) How to visualise effects on SF-12? Will only be tiny numerical differences for now (as only changing range of NewEthPop data used here), but would be good to understand how to do it for later in fertility development process. E.g. need new make target somewhere (outcomes/Makefile) -> marking as DONE because vague and covered by jobs below (99) Once everything here done, discuss duplicating functionality to mortality module, as very similar (e.g. rate table generation, as format of NewEthPop fertility and mortality input data is almost identical) -> DONE as podded off into another issue (#213)

Rough to-do list:

RobertClay commented 1 year ago

will come up with some answers for thursday.

paddy-r commented 1 year ago

will come up with some answers for thursday.

Thanks, I'll try and get a load of the jobs done in the meantime.

RobertClay commented 1 year ago

(1) Clarify difference b/t key_columns and parameter_columns in add_new_birth_cohorts.setup

Another very undocumented part of vivarium.. Its an interpolated lookup table. Make sure you understand lookup tables and linear interpolation before you read this. key_columns are the look up variables. E.g. for key_columns = [region, sex, ethnicity] it will find the rows in the lookup table with those values like [East Midlands, F, BAN]. There can be more than one row here.

parameter_columns = [age, time] is more complicated and uses linear interpolated lookup (order 0 I think?). For an observation in the population you can have continuous age and year timestamp e.g. [age, year] = [51.1245, 2012.12412]. The problem is how to estimate fertility rate given we have discrete values in the lookup table at age 51/52 and years 2012/2013. In the lookup table age_specific_fertility_rate we provide vivarium 4 columns age_start, age_end, year_start, year_end. Specifying parameter_columns age and time tells vivarium that observations on these values will be continuous data and which columns to use for start and end points of linear interpolation. This is probably better demonstrated with a diagram. Happy to discuss more.

(2) Which is current, FertilityAgeSpecificRates or nkidsFertilityAgeSpecificRates? (Presumably the latter.)

The latter.

(3) Currently data_generation.convert_rate_table, which generates rate table file, not called anywhere. How about calling during installation/setup to ensure regional output files present? Or during fertility module initialisation?

It should be called somewhere yes. Are you sure its not in the fertility pre_setup function .set_rate_table()? I believe they're cached as they can be quite expensive to generate particularly if youre adding more data in.

(4) Should mean be weighted (by population? NewEthPop data exist) in collapse_LAD_to_region (also, collapse_location)?

Not sure. I did this very roughly and not sure if there are suitable weights available. One to discuss on video I think.

(5) Are LA and region definitions current? If not, can use code from Inclusive Economy to generate lookups

I believe they're 2019? I had to manually adjust some areas (northamptonshire/gloucestershire?) that changed their boundaries recently. Your IE code will be better.

(6) Why LAs used in BaseHandler.compute_migration_rates (presumably for migration modules?) but regions used in fertility module?

We don't use migration in MINOS. Its from the old model Daedalus that does use LA level data. I'd say ignore it for now but Nik would probably love you if you did some maintainence on daedalus too.

(7) Think about how to generalise output/logging functionality in RunPipeline (and already a comment about it there) as very useful for me during fertility module development (cf. https://github.com/Leeds-MRG/Minos/issues/167) -> already partly addressed in job below -> create new issue if good idea to add more detailed functionality

Lukes done a lot of logging. Id suggest talking to him but python logging module is usually pretty clear and easy to add to. The more the merrier.

(8) How/where to generate/view specific variables during simulation, in the first instance fertility rate and year for which data is sought and year for which data are available?

Pycharm debug flags may be useful here? Or some kind of verbose mode.

(9) How to visualise effects on SF-12? Will only be tiny numerical differences for now (as only changing range of NewEthPop data used here), but would be good to understand how to do it for later in fertility development process. E.g. need new make target somewhere (outcomes/Makefile)

Are the current lineplots we have sufficient? This is a larger problem we're having at the moment for how to visualise the csv outputs. Discuss.

(99) Once everything here done, discuss duplicating functionality to mortality module, as very similar (e.g. rate table generation, as format of NewEthPop fertility and mortality input data is almost identical)

100% do this next. They're very similar with slight differences (e.g. men can die but not give birth).

RobertClay commented 1 year ago

image Interpolated lookup diagram.

paddy-r commented 1 year ago

Another question, very trivial...

(10) Which have higher priority, interventions or mortality/fertility modules, based on text in default.yaml and RunPipeline?

paddy-r commented 1 year ago

Another question, very trivial...

(10) Which have higher priority, interventions or mortality/fertility modules, based on text in default.yaml and RunPipeline?

From discussion, 20/04/23, priority is:

  1. Replenishment
  2. Fertility, then mortality
  3. Intervention (if present)
  4. All pathways
  5. SF-12

Added to list of jobs.

paddy-r commented 1 year ago

Closed with #259.