Potentially capping delta rate at 1 +/- 2% per year

rhanIDM commented 2 years ago

0098a4da-947a-4421-88a0-bb9f882c2b7c @celiot-IDM observed that with these parameters, the fertility rate for 45-49 year-olds rises all the way to 0.090 over a 20 year period, while the rate for 40-44 year-olds drops to 0.017. That seems unlikely ... This confirms our concern that a high delta rate albeit calculated from credible sources 1) is arguably unlikely to sustain over 15 years; 2) compounded over a long time horizon gives us questionable predictions. We need a way to systematically contain the delta rate in a reasonable range.

BHagedorn-IDM commented 2 years ago

I don't think we should hard limit these to 2%, but rather flag them as a risky input in the checks. @rhanIDM I think we should calculate the change rates > 1 as you had suggested the other day.

celiot-IDM commented 2 years ago

I love the concept of "risky inputs" :-)

And yes ... I agree with the general approach of telling the user "that's a really big delta you've got going on there" is better than silently "correcting" their "mistake".

Question: flag in the standalone validation tool, or in the R package, or both?

rhanIDM commented 2 years ago

One approach could be we explicitly separate a short-run ChangeRate and a long-run ChangeRate. We keep the current ChangeRate values and apply them for the first 3 to 5 years, and we add an additional ChangeRate-LongRun column and set them to a value of 0.98 or 1.02 depending on the expected long-run trend being negative or positive, the ChangeRate-LongRun would apply to all future years after the first 3 to 5. This setup produces results that are easier to interpret: the recent year forecasts are based on the assumption that the ChangeRate continue to be relevant in the near future; the 15-year forecasts capture the lasting effect of recent changes.

BHagedorn-IDM commented 1 year ago

I love the concept of "risky inputs" :-)

And yes ... I agree with the general approach of telling the user "that's a really big delta you've got going on there" is better than silently "correcting" their "mistake".

Question: flag in the standalone validation tool, or in the R package, or both?

@MeWu-IDM Any thoughts on whether we should put this into the validation review? Or, has that already been done?

MeWu-IDM commented 1 year ago

I think we have already: https://github.com/InstituteforDiseaseModeling/PACE-HRH/blob/main/config/validation/rules/rules_PopValues.yaml#L22-L28

rhanIDM commented 1 year ago

@BHagedorn-IDM and I reviewed the fertility ChangeRate for all regions and Addis Ababa stood out with ChangeRate>=1.02 for all age bands between 20 and 44. @BHagedorn-IDM consulted fertility change rates used by UN and World Bank and noticed 1) most of the countries comparable to Ethiopia has ChangeRate around 0.97 or 0.98; 2) ChangeRate doesn't stay above 1 for many years and is rarely above 1.03. To allow a ChangeRate to start from 1.03 and in later years transition to 0.98 would likely require changes to the calculation steps currently in the model, so we decide to move this issue from Milestone Release to Milestone Refresh. In the meantime, @rhanIDM will run sensitivity analyses to quantify the effect of fertility ChangeRate assumptions on clinical hours predictions for Addis Ababa using three scenarios 1) high ChangeRate as is; 2) ChangeRates calculated from 2019-2000 DHS data, which are lower but still mostly above 1; 3) flat ChangeRate of 0.995.

BHagedorn-IDM commented 1 year ago

Postponed until we speak with VitalWave.

rhanIDM commented 1 year ago

Copying over the thread of discussion that happened through email:

VitalWave - Preferred solution for fertility and incidence rates: add an additional column to model input sheet to cap the rate until it gets to a specified value.

Charles - The first (capping fertility and incidence rates) should be easy to implement but feels a tad clunky. I suspect if we throttle exponential increase as being unrealistic, we open the door to questions about our statistics. It’s the old question of whether variance comes from the science or from the model. If we constrain growth for good science reasons, we might also need to constrain decrease as well in order for the statistics to work out.

Kevin - Agree with Charles that capping the rates feels clunky. But exactly how to structure the changes to address 1 & 2 is I think worth a bit of discussion. If there is only a single “annual rate of change” parameter and it’s multiplicative, then you generically have geometric sequences xt+n = rn xt, which goes either to zero or infinity for any r not equal to 1. I guess for r ~ 1 and over 10-20 year time horizons this shouldn’t be a huge deal, but I suppose they are already running into it or they wouldn’t have brought it up.
Speaking from the modeling science side and not the actual code implementation side - probably the most flexible possible solution would be promote the scalar “annual rate of change” to a vector of annual rates of change per year (or equivalently just specify a vector of fertility and incidence rates per year, either way). Then the user at least has the freedom to basically choose any trajectory they want for those rates instead of only allowing geometric sequences – it’s more complex than the single scalar parameter, but I suppose if these rates are major, zero-th order contributors to the workload calculation then it’s probably important to allow more flexibility in their trajectories?
(Then we could get into whether you want those rates to be deterministic trajectory with only ‘process noise’, or to implement ‘parameter noise’ so that any individual simulation follows a stochastic random walk along that mean trajectory, and whether that random walk has constant or time-dependent noise… but that’s probably all overly complex and fairly small potatoes, compared to just having a more flexible knob for the trajectory of those rates, if that’s a sticking point for the users).

Charles - Some more thoughts:

High growth rates have an associated drunkards-walk effect: if a rate goes stochastically high in year N, it’s likely to stay high for several years thereafter. (It’s only the rate at which rates change that’s Markov … the rate at year N+1 is correlated with the rate at year N).
The code as-is computes entire population rate trajectories before doing other calculations, so it wouldn’t be hard to slip in user-defined trajectories. (As we know from EMOD, representing vector/schedule inputs can be a UX challenge, but that’s a different issue.)
Functionality to implement Kevin’s thoughts regarding process vs parameter noise could be inserted at the same place in the code. The actual computations might be challenging, but they wouldn’t cause widespread rework.
Alternative approach. Rather than capping individual rates changes, instead prune out trajectories that don’t make sense in some more holistic way. Every sim begins from a single random number seeding event. If all the seeds for N sims are selected in advance, the sims themselves become deterministic based on the seed value. We could then insert logic to remove trajectories (based on whatever logic – scientific or statistical – makes sense), keeping just the sane ones.

Brittany to VitalWave - To implement the ‘cap’ on the fertility, mortality, and incidence rate changes, we considered adding a column to both the pop values and the task values sheets; while this gives the user maximum control, it introduces an extensive number of additional inputs and seems likely to become error prone. So, we would suggest a simpler solution, which caps the change rates relative to their original values. We’d allow the top- and bottom-side caps to differ and they would be different by rate category (fertility, mortality, incidence). The values are written as ratios to baseline, since this is how the change rates are entered elsewhere and we thought consistency would be helpful.

VitalWave approved proposed solution on March 1st.

celiot-IDM commented 1 year ago

@BHagedorn-IDM , @rhanIDM

Starting to work on this, and I have some questions about implementation details.

Let's take the example from the start of the thread combined with limit examples from the previous post:

In the highlighted example, the initial fertility rate is 0.27, the change rate is 1.06 and the cap relative to baseline is 1.20. This is how I'm interpreting the algorithm:

Multiply the initial ("baseline") fertility rate - 0.027 - by the max rate cap multiplier - 1.20 - to get the maximum allowed fertility rate value for the band - 0.0324.
Repeat with the min rate cap multiplier - 0.80 - to get the minimum allowed fertility rate for the band - 0.0216.
For each year in the model
- Compute a stochastically tweaked value of the rate multiplier (ChangeRate * (1 + epsilon)). epsilon can be negative, so the rate multiplier will end up less than ChangeRate half the time.
- Compute the fertility rate for this band and year N+1 - f(N+1) by taking the computed value for year N - f(N) - and multiplying by the computed rate multiplier.
- If the computed f(N+1) value is greater than the max cap value - 0.0324 - replace the computed value with 0.0324.
- If the computed value is less than the min cap value - 0.0216 - replace the computed value with 0.0216.

(Aside: in audio mixing we would call this a "limiter with a hard knee", meaning the limit value kicks in abruptly. In this example the change rate for the band is high - 1.06 - so the max ratio of 1.2 is reached after only about 3 or 4 iterations. Also due to the high change rate, the stochasticity isn't going to pull the fertility rate back down very often (to come down the tweaked change rate has to go below 1, not just below ChangeRate). Most of the trajectories for this band will look the same: rapid rise to the max - 1.20 - then flat-lining. The stochasticity will determine when the flat-line is hit. If we end up not liking this behavior, we could apply a different approach based on a different audio concept: "compression with a soft knee". But for now I'm planning to stick with the hard knee limiter approach. Which, BTW, is called a "brick-wall" limiter in audio jargon.)

celiot-IDM commented 1 year ago

Some thoughts re this:

Please give column A a simpler label ... I suggest "RateCategory"
Please capitalize the first letters of Min and Max.
I like the tab name (ChangeRateLimits) :-)
Finally, are these values going to be static for all scenarios, or will there be several of these sheets accessed through another sheet_xxxxx field on the Scenarios tab?

rhanIDM commented 1 year ago

Thanks @celiot-IDM. I have updated the headers based on your suggestions. I am biased towards keeping these values static for all scenarios. I think of these limits as safeguards to prevent overly unrealistic values, and this is a good place to examine when the model predictions seem extreme. These values are unlikely to be useful in sensitivity analyses, making them less suitable to be listed on the scenarios tab.

celiot-IDM commented 1 year ago

Done!

InstituteforDiseaseModeling / PACE-HRH

Potentially capping delta rate at 1 +/- 2% per year #119