ciemss / program-milestones

Repository for materials related to program milestone hackathon and evaluation events
0 stars 0 forks source link

NovDemo-4: Hierarchical modeling #16

Open sabinala opened 1 day ago

sabinala commented 1 day ago

Scenario 4: Hierarchical Modeling (see here)

In this scenario, we have simulated data from a geographic region. This region is made up of three counties, A, C, and C, and each county consists of two cities. The cities are numbered 1-6. Cities in the same county have similar transmission rates (drawn from the same distribution), but transmission rates vary stochastically over time. Counties do not share similar transmission rates with other counties.

The simulated dataset has 100 days of data for each city. Each day contains 10 timesteps, so there are 1000 observations for each city.

Screenshot 2024-11-12 at 4 41 16 PM

Epidemiological data can be imperfect, and you will find that to be the case within these counties. Specifically:

The governor of the region is interested in answering a series of questions about the cities and counties. The data is recorded in an SEIR format, which follows these equations:

\begin{align}
\frac{dS}{dt} &= - \beta S \frac{I}{N} \\
\frac{dE}{dt} &= \beta S \frac{I}{N} - \alpha E \\
\frac{dI}{dt} &= \alpha E - \gamma I \\
\frac{dR}{dt} &= \gamma I
\end{align}

The initial conditions of the cities and the data for each city are provided in the S4_data.csv file. (Initial conditions are also listed below)

Screenshot 2024-11-12 at 4 56 48 PM

Problems:

  1. Estimate city-level transmission rates. (a) use the data provided to estimate $\beta$ for Cities 1-5 without incorporating information from any other cities. (b) For County C, City 6, note that the intervention was implemented starting at t = 30 days. Estimate the transmission rate before and after the intervention took place.
  2. Using the data provided, estimate both county and regional transmission rates. Additionally, estimate transmission rates for each city, this time by incorporating information from other cities. Note any differences from the result in Q1.
  3. For County A, City 1, impute the missing chunk of data using (1) the model that incorporated information from other cities and (2) the model which did not incorporate or include information from other cities. Plot both estimated SEIR curves and compare.
  4. Repeat the same exercise in Q3 for County B, City 3.
  5. (Counterfactuals) Imagine that the intervention applied in County C, City 6 had also been applied to County B, City 3 and City 4, at the same time. How would this have reduced the total number of infections in County B across the duration of time that the data was collected?
  6. (Optimization – choose a geography) The governor of the region is aware of a new variant that has started to spread in a different region, and the governor fears that it may cause potential damage to the cities and counties they oversee. The best estimate of the transmission rate for this new variant is 1.2x as transmissible as previous variants.

    The governor has funding to apply an intervention in 2 of the 6 cities and is interested in minimizing the total number of infections in each city. The intervention is expected to reduce the transmission rate of the new variant to 80% of its previous transmissibility rate. The intervention would start at t=100 timesteps (day 10) and can run for the rest of the period (up to day 100). For this question, initial conditions are the same as Table 1.

    In which cities should the governor implement the intervention? By implementing the intervention, what are the total number of infections the governor would expect over a 1000-timestep (100 day) period, and what are the total number of infections the governor would expect if they chose to do nothing? Provide 90% intervals on all projections.