Development of cohort model for maternal/newborn health analysis

joehcollins commented 2 months ago

(@tamuri starting new issue for discussions. @tbhallett FYI)

To improve the speed/scope of analyses relying on a suitable number of pregnancies/births within the simulation we want to develop an approach to run the model with a cohort of women who are pregnant at initialisation. We can then run the model for only 1 year (or 10 months) for analyses of outcomes allowing for a greater number of scenarios to be explored and more comprehensive sensitivity analyses.

To generate the properties of the pregnant population (including prevelance of modelled health conditions (e.g. HIV)) I ran a full model simulation with 250K people from 2010-2025 and logged the population.props row at pregnancy initiation. This generated a data frame of around 16K newly pregnant women in 2024.

When I tried to force this data frame over the population props data frame at initialisation I came across a number of issues that I couldn't get around (see PR #1320).

joehcollins commented 2 months ago

Hi @tamuri - hoping to return to this when back from Malawi next week. Would it be useful to discuss further?

tamuri commented 2 months ago

Discussing with others at the next softeng meeting (this Friday 27th) to see how best to proceed.

joehcollins commented 1 month ago

Great thanks Asif!

joehcollins commented 1 month ago

Hi @tamuri @tbhallett. Some thoughts after our meeting today and looking at the code again:

To ensure the epidemiology of the MNH conditions remains accurate during the simulation the minimum 'other' disease modules I would need to register would include HIV, Malaria, Cardio Metabolic and Stunting. However due to the dependencies of these modules this would require nearly all modules to be registered anyway (basically excluding the cancers, RTI etc.). I'd propose we just use the full model then given my next point...
I've been through the initialise_population function for each of the modules. If we were simply to allow each module to initialise the population and then overlay a new population dataframe in a new initialise_population for a cohort module (the pregnant population) there would only be issues coming directly from 2 modules - HIV and Tb. Both these modules include functions that schedule events/HSIs during initialise_population which would then crash when the new pregnant population was inputted. One solution would be to use the dummy modules? Or could be clear the HSI queue? Or we would need to move this code. (Realising now that initialise_simulation could also have some scheduling issues that i'll now look at)
If we address point 2, would we then perform another full_model long run but instead of logging new pregnancies, generate a data frame which we pickle? Then this dataframe would retain column types?
Finally, if we take this approach we need to address the issue with population.py which I highlighted in #1320

UCL / TLOmodel

Development of cohort model for maternal/newborn health analysis #1462