act-now-coalition / covid-data-model

Data backend providing computed data for the graphs displayed at https://covidactnow.org
https://covidactnow.org/
MIT License
149 stars 57 forks source link

Improve Small N Infection Rate Performance #1130

Closed BrettBoval closed 3 years ago

BrettBoval commented 3 years ago

This PR addresses issue https://trello.com/c/JzkA9Q51/1366-can-auto-pilot-what-to-do-with-infection-rate.

The current implementation of our Infection Rate is sensitive to places with consistent low case numbers. In some cases, this will cause a systematic error that makes the metric appear to be very high and growing. Right now this error is only seen in locations that are small enough to have on the order of 1s of cases over the last couple weeks.

We our updating our model to be more clear about our uncertainty at very low cases numbers.

We decided to choose a threshold number of smoothed daily cases. Below that threshold, we say "this is too low to confidently have informed priors about the future infection rate" with the consequence that we will use the default priors to restart the model going forward. That default is standardized to a gamma with a=2 which has a mode/prediction of 1. So for as long as the smoothed daily cases remains below that threshold, we will maintain the default "the infection rate can be loosely defined as being unity in this edge case while we wait for more data". Right now we will continue to show that series of 1s, but in the future we might just mask them.

Chris and Brett hand-tuned the threshold cutoff to be 0.4 units in the Infection Rate smoothing reference frame. Due to historical+product reasons, the smoothing window for the case input data is more aggressive than the Daily New Cases smoothing window. In that reference frame, 6 new cases spread evenly over 10 days results in a peak value of 0.41. Looking at that scenario and others, we decided that 0.4 was good enough. This threshold should be revisited if the smoothing kernel for the input data changes.

Our internal discussions and investigations were tracked in this document.