CDCgov / wastewater-informed-covid-forecasting

Wastewater-informed COVID-19 forecasting models submitted to the COVID-19 Forecast Hub
https://cdcgov.github.io/wastewater-informed-covid-forecasting/
Apache License 2.0
44 stars 8 forks source link

Alternate epidemic phase analysis based on P(actual phase | we predicted phase) #128

Open kaitejohnson opened 3 months ago

kaitejohnson commented 3 months ago

Goal

We performed an initial analysis that uses the retrospectively estimated R(t) to categorize forecast dates into epidemic phases, and then look at how the forecast performance compares across these epidemic phases. In this case, the epidemic phases are estimated with future knowledge, thus what we were doing was looking at P(we predicted phase | actual phase). Here, we want to condition on what we are predicting for the phase, and evaluate against the actual phae so P(actual phase | we predicted phase).

Plan

We can either do this with R(t) or with the trend in admissions. Plan would be:

  1. For each model run (forecast_date-location-model) combo, come up with an algorithm to classify the phase (can use the same as we used for previous epidemic phase, adopted from Lopez et al). This categorizes what phase we predicted on that forecast date
  2. For each location's time series, come up with an algorithm to classify the actual phase. This categorizes the "actual" phase of that forecast date.
  3. Create a n x n table/heatmap to assess accuracy of phase categorization
  4. Look at the distribution of CRPS scores as a function of the phase we predicted (1).

What algorithm to use:

  1. If using R(t), can do the same as the current algorithm, but for 1, we would use the real-time R(t) from our model on the forecast date. For 2, we would still use the retrospective R(t) estimate
  2. Because the R(t) is time-lagged by ~ 2 weeks from the admissions, we might want to do the phase categorization instead by admissions. We could do this by: a. Get the observed phase by fitting the log of each week's data to a linear model, and then use the increasing/decreasing/uncertain --> increasing/dec/peak/nadir algorithm, same as what we previously did. Or we could use the weekly rolling average and find the peak and trough?? @dylanhmorris I think you had an idea here, feel free to expand on it bc I don't know if I was following. b. Get the predicted phase by categorizing each hospital admissions trajectory as increasing or decreasing (based on removing the weekday effect??), and then finding P(increasing) and applying same algorithm as P(R(t) >1))

@zsusswein Curious if you have any thoughts! As you predicted, I am not sure R(t) is the right quantity here bc of the long time lag...

seabbs commented 3 months ago

So I was thinking we would do:

  1. Classify the current phase based on i.e the last two weeks of growth in hospitalisations
  2. Score by these phases and compare
  3. Party

I am not sure we need to have a predicted and actual phase?