DARPA-ASKEM / program-milestones

Repository for materials related to program milestone hackathon and evaluation events
6 stars 10 forks source link

Epi Scenario 3: Causal Analysis with Interventions #73

Open djinnome opened 5 months ago

djinnome commented 5 months ago

Scenario 3: Causal Analysis with Interventions

Estimated % of time: Baseline 30%; Workbench 20%

In this scenario, we are interested in determining the effects of masking and social distancing on Covid-19 infections using simulated data. The simulations use contact matrices and populations subdivided into three age groups. The data are generated from an SEIR model.

 

In these questions, we provide the contact matrices and population data as well as the outputs of the simulated SEIR model. We ask you to calibrate a model, compute $\beta$ at different time intervals, and to estimate the causal effects of interventions.

 

The model can be described with the diagram described in Figure 2, and the following set of ordinary differential equations:

image

Figure 2. Model structure for Scenario 3: Causal Analysis with Interventions

 

$\frac{dS_i}{dt}= -\beta\cdot \frac{S_i}{N}*(1-m_{ew}m_{cw} )\sum_{j=1}M_{ijw}I_j$
$\frac{dE_i}{dt}=\beta\frac{S_i}{N}(1-m_{ew}m_{cw} )(\sum_{j=1}M_{ijw}I_j )-r_{E\rightarrow I}E_i$
$\frac{dI_i}{dt}=r_{E\rightarrow I}E_i- r_{I\rightarrow R}I_i$
$\frac{dR_i}{dt}=r_{I\rightarrow R}{I_i}$


The above equations include the following constant parameters:

-       $r_{E\rightarrow I}$, the rate of transition from compartment E to I = 0.08/day

-       $r_{I\rightarrow R}$, the rate of transition from compartment I to R = 0.06/day

-     $\beta$, which we ask you to estimate

And three parameters which change over time:

-       $m_{cw}$ is mask compliance over interval w

-       $m_{ew}$ is mask efficacy over interval w

-       $M_{ijw}$ is the value of the contact matrix for row i and column j (from age group i to age group j) during time interval w

 

Use the following initial conditions (all units are number of people):

<table class=MsoTableGrid border=1 cellspacing=0 cellpadding=0 style='border-collapse:collapse;border:none'>

S1

S2

S3

E1, E2, and E3

I1, I2, and I3

R1, R2, and R3

10305660

15281905

12154442

50

50

0

 

Supplementary files population.csv and ContactMatrix.csv contain data on the population counts and the contact matrix for each of the three age groups. The output data provided has counts for S, E, I, and R for each of the three age strata. The output files are called S3SimulationRuns.csv and S3SimulationRuns.RDS. These have the same information but in a slightly different format.

 

The contact matrix, M, is provided in the supplementary file, but is also written below in Table 1.

Table 1. Contact matrix for Scenario 3: Causal Analysis with Interventions. Units are average number of contacts per day.

 

Age Group 1

Age Group 2

Age Group 3

Age Group 1

38.62

20.56

6.12

Age Group 2

20.56

28.22

11.60

Age Group 3

6.12

11.60

20.01

 

 

In the simulation, two interventions happen simultaneously:

Masking

·      From t = 0 to t = 50 days, no masking occurs.

·    From t = 50 to t = 100 days, some masking happens ($m_{cw}=0.5, m_{ew} = 0.6$) and spread of Covid decreases.

·    From t = 100 to t = 150 days, masking still happens ($m_{cw}=0.4, m_{ew}=0.2$but with less intensity.

 

Social distancing

·      From t = 0 to t = 20 days, no social distancing occurs.

·      From t = 20 to t = 80 days, social distancing happens, reducing contact rates to 30% of their original values across the board.

·      From t = 80 to t = 150 days, social distancing happens, reducing contact rates to 80% of their original values across the board.

 

This simulation is deterministic, but we draw $\beta$ randomly from a distribution and run the simulation 25 times with slightly different values of $\beta$. This is intended to allow us to ask questions about uncertainty.

 

image

Figure 3. All twenty-five runs of the simulation stratified by age group

1.     Model Extraction (see S1Q1 for definition of model extraction): Extract the model and set default parameters and initial conditions. For now, use a dummy value for $\beta$. Note the time to extract the model and get it into an executable state that can run a simple test simulation and get sensible results. For workbench modelers, model extraction time may include human-in-the-loop curation, and for baseline modelers, this time may include debugging code. Provide simulation results from your test simulation.

 

2.     Model Calibration:

·    Calibrate the model to estimate $\beta$ in all 25 runs of data provided. Since all runs were generated using a different value of $\beta$each value of $\beta$ should be a little different. Save the values of $\beta$ for use in Q5 and Q6.

·    Average all 25 runs together and calibrate a model to estimate $\beta$ using the averaged data. Use this calibrated model and averaged data for Q3 and Q4.

 

3.     Causal Effects: Estimate the average treatment effects on infections, for each of the last four unique intervals, using the following approach:

·       To estimate the average treatment effect (ATE) for the nth time interval, use your calibrated model from Q2b, parameterized only for the (n-1)th time interval, and generate a forecast of the model over the nth interval, where no change in interventions take place (you assume the interventions in place in the (n-1)th interval continue uninterrupted). Compute the root mean squared error (RMSE) between the model forecast of infections in the nth interval and the average of the supplementary data for the nth interval. This is the ATE for the nth interval.

 

For example, calibrate a model using data from time interval (0, 20), and simulate the model over the interval (20, 50). Compare the simulated output over the interval (20, 50) to the average of the provided data for the interval (20, 50) to estimate the ATE of the set of interventions in the interval (20, 50) (as defined originally in the scenario background), on infections.

 

·      For each interval you calculate ATE for, generate plots comparing the actual data (all compartments) to the forecasted output had there been no change in interventions.

·      Include uncertainty in the estimated effects.

 

4.     Interventions: Use your fitted model from Q2b to conduct an approximation of a sensitivity analysis.

·      Change the original reduction in contact matrix at t = 20 days to 40% of the original value. How does that affect infections at t = 50 days? Calculate ATE for this change in reduction.

·      Repeat Q4a but change the reduction in contact matrix to 20% of the original value. Show the change using plots and changes in calculated ATE.

·      (Optional) Change the reduction in contact matrix in other ways (e.g., instead of changing from a 30% decrease to 40% decrease, change the 30% decrease to 50% decrease), or by changing which age groups have a reduction in contact rate, to demonstrate how various types and levels of contact reduction can affect outcomes.

 

5.     Intervention Optimization: In this question, we will ask you to find the minimum level of mask efficacy needed to ensure that the maximum number of infections in the most populous age group (I2) is below 5,000,000 people, with 90% confidence.

Without knowledge of the exact distribution from which $\beta$ is drawn from, but with simulated data provided, one way to approach this is with the following steps:

               a.      Examine the values of $\beta$ from Q2 and fit a distribution to these values. Use this approximate distribution to calculate a $\beta$ you can use to represent the appropriate quantile for the confidence level.

               b.      Using the value of $\beta$ you calculated in Q5a, determine the minimum level of mask efficacy needed to ensure that the maximum number of infections in I2 remains below 5,000,000 people. Demonstrate this with a plot of simulation outcomes.

 

6.     Intervention Optimization: What is the latest time the first masking intervention (currently at t = 50 days) can start to keep total infections below 11,000,000 people at any point in time in the simulation, with 95% confidence? Assume nothing else in the original simulation specification changes. You can apply a similar procedure to the one in Q5 (find the right $\beta$ from a fitted distribution, and then optimize over the parameter of interest) to solve this question. Demonstrate your answer with a plot of simulation outcomes.

 

7.     (Optional) In this question, we use the original SEIR model defined in the scenario introduction, but with no masking or social distancing interventions. Instead, we provide data generated from an SEIR model where $\beta$ varies at every time step over the course of the simulation. The data are in the file [`ChangingBeta.csv`](https://raw.githubusercontent.com/DARPA-ASKEM/program-milestones/main/18-month-milestone/evaluation/Epi%20Use%20Case/Scenario%203%20Supplementary/ChangingBeta.csv).

a.   Using the original social distancing matrix, configure the SEIR model in 3 different ways using the following values of $\beta$: 0.10, 0.13, and 0.16. Keep all other parameters the same (aside from intervention parameters, which are set to 0). Calibrate an ensemble model using the 3 model configurations and the provided data. Compute RMSE between your calibrated ensemble model (infections variable output) and the true infections output in the data provided.

b.   Similarly, calibrate a single SEIR model to the simulated data and compute RMSE. This model should have one constant value of $\beta$. Compare the calibrated ensemble output from Q7a to the single model calibrated output. Plot both against the true data to demonstrate goodness-of-fit of the calibration.

<br clear=all style='page-break-before:always'>

 

Scenario 3 Summary Table

<table class=MsoTableGrid border=1 cellspacing=0 cellpadding=0 width=672 style='width:503.75pt;border-collapse:collapse;border:none'>

Question

Inputs

Tasks

Outputs

Q1

Model description

·  Extract equations

·  Extract parameter values

·  Iterate/curate extraction and execute model until a test simulation gives reasonable results

·  Extracted models grounded with all variables and parameters defined, and with units

·  Test simulation plot

·  Time to do model extraction

·  Time to execute extracted model and plot results

Q2

Simulated data

Calibrate model to data

Calibrated $\beta$ values, and single calibrated model using averaged data

Q3

Calibrated model from Q2b

·  Estimate average treatment effects of interventions, with uncertainty

·  Plot data against counterfactuals

·  Estimated ATE values with uncertainty

·  Plots showing counterfactual scenarios

Q4

Calibrated model from Q2b

Implement changes in contact matrix

·  Plots showing how changing the contact matrix affects the output

·  Values for average treatment effect

Q5

Simulated data

Conduct optimization for the time of the first masking intervention

A plot showing that infections can be kept below 5 million on any given day for a particular value of mask efficacy

Q6

Simulated data

Conduct optimization for minimum mask efficacy

A plot showing when interventions need to start to keep total infections below 11 million on any given day.

Q7

Simulated data using a changing value of beta

 

·  Create an ensemble model with 3 configurations of the same model

·  Calibrate ensemble model to the provided data

·  Compute accuracy (RMSE) of the single model and the ensemble as compared to the true simulated data

Plots and RMSE calculations showing how well the calibrated single model and the ensemble models fit the simulated data

 

Decision-maker Panel Questions

1.     What is your confidence in understanding model results and tradeoff between potential interventions? Select score on a 7-point scale.

1.     Very Low

2.     Low

3.     Somewhat Low

4.     Neutral

5.     Somewhat High

6.     High

7.     Very High

 

Explanation: Determine your confidence in being able to assess effectiveness of all interventions considered in the scenario and understand how uncertainty factors into results.

 

The decision-maker confidence score should be supported by the answers to the following questions:

 

·    Do you understand the effects of interventions on trajectories? Was the effectiveness of interventions communicated?

·    Is it clear how to interpret uncertainty in the results? Do you understand the key drivers of uncertainty in the results?

·    Did models help you to understand what would have happened had a different course of action been taken in the past? How confident are you that the counterfactual analysis correctly explained what would have happened had a different course of action been taken?

·    How confident are you that the analysis correctly identified and attributed responsibility to causal drivers in the scenario?

<br clear=all style='page-break-before:always'>