Open krosenfeld-IDM opened 3 months ago
I am still seeing this behavior even with the fix in do_births()
.
My suggestion - can you write out the agent ages and dates of death on Day 1? Finding this bug I think would be helped by knowing who is drawing those deaths?
One potential clue - First year-of-life mortality is pretty much the highest single year. I think in the Nigeria survival curve, some 12% of kids die before the age of 1.
So most obvious potential culprit that stems from this - for most ages, $S{a-1}^{a} \approx S{a}^{a+1}$. So, e.g.,to figure out whether an agent dies in the next year, if an agent is 30 years and 183 days old, it doesn't really matter if you use $S{30}^{31}$ or $S{31}^{32}$, or half a year of exposure to the first mortality rate and half a year to the second, you'll get it approximately right either way.
But! $S{0}^{1} \not\approx S{1}^{2}$. So depending on how the draw is being handled, if we're not offsetting the "mortality exposure" for sub annual age, a kid who's 11 months old could be being exposed to a full year of the 0->1 mortality rate, which will end up with way too many deaths.
If that isn't the problem, there's a potentially more subtle one. Most of that 0->1 mortality actually occurs in the first day, first week, and first month of life. Depending on how we initialize, this could end up with "too many" kids under 12 months at initialization, also being exposed to that high mortality rate.
What I mean is - if 100k kids are born, and 88k survive to year 1, then using the annual averages to initialize age would tell you there should be about 94k 0->1 year olds at any time. But reality is probably that there are more like 89 or 90k, because that mortality is so stacked into the first month. This one's a bit more subtle, I don't know if it should cause that spike vs. just being "wrong" if someone cared about sub-annual ages. But if the first idea above doesn't fix the issue, this is where I would look next. We can figure out how to get the appropriate numbers for day 1, day2-7, day 8-30, and day 31-364 survival, and I think the date of death code would have to be updated to not assume annual binning but take a vector of ages that it's interpolating between. Which is something that should eventually happen anyway but may not need to happen now unless it specifically fixes this bug
That's a good point @KevinMcCarthyAtIDM. I think we're using US statistics for this notebook https://github.com/InstituteforDiseaseModeling/laser/blob/e6b00057ead082c8f38e3daf2fa0c6f97d4bb58a/src/idmlaser/kmcurve.py#L6
so the effect should be even larger if we switch to NGA statistics. I have what should be the equivalent table in the GEOMED repository if you want to try that @clorton
I'm seeing the same effect in the current GEOMED version (https://github.com/gatesfoundation/GEOMED24/commit/0dea1e11d26ded21eda466da37e82b17cc67e72f) although the yearly deaths seem to rise linearly:
https://github.com/gatesfoundation/GEOMED24/commit/0dea1e11d26ded21eda466da37e82b17cc67e72f
I think it's likely that the first year will be part of a burn-in anyway and the effect goes away after that. But good to know about and glad it is documented here! Nice catch @KevinMcCarthyAtIDM .
I would expect yearly deaths to increase as the population increases and is not offset by any increasing average lifespan.
I would expect yearly deaths to increase as the population increases and is not offset by any increasing average lifespan.
Agreed, that's normal. But the spike in year 1 isn't. This is probably not a breaking bug for Katherine's work, since it disappears after burn-in. But also of course I'd feel better making sure we understand it. I think the quickest path is looking at the age distributions of all agents who die in Year 1 vs. all agents who die in some later year. The explanations I gave above would imply that the histogram in Year 1 should have more kids in the 1-2 year age bin than later years. If we see that, great, we understand it enough to know it's not the most critical fix for now; but if we don't see that, then I'm confused and want to make sure it's not something weirder...
Here is what I'm seeing for the age distributions @KevinMcCarthyAtIDM comparing the first year of the sim and the rest (3 year sim)
https://github.com/gatesfoundation/GEOMED24/blob/features/notebooks/measles.py Note that I switched to using the same code as @clorton to reduce confusion.
Sorry - these are infections! Updating now for deaths...
ok - here are the deaths as measured by:
Distributions look fairly similar.
Updating with NGA deaths and age pyramid (and plotting density for readability)
On commit 9a1ba81 we're seeing non disease mortality spike at the beginning of the simulation:
This may be due to
dod
not being offest by currenttick
as agents are activated:see https://github.com/InstituteforDiseaseModeling/laser/blob/clorton/cleanup-for-merge/nnmm/measles.ipynb