Recidiviz / covid19-dashboard

[Decommissioned] Dashboard for projecting Covid-19 spread in prisons and modeling hypothetical scenarios
https://model.recidiviz.org
GNU General Public License v3.0
16 stars 11 forks source link

Set Rt as 0 if cases on all dates in the past 1wk are the same, and that case count >0 #449

Open cawarren opened 4 years ago

cawarren commented 4 years ago

Is your feature request related to a problem? Please describe. Last week we updated Rt calculation behavior to ignore duplicate observations. This helped prevent repeated observations from artificially impacting the Rt calculation.

However, there are a couple of edge cases due to this behavior:

  1. When the outbreak in a facility is extinguished, Rt will be stuck at the same rate it was at on the last day of an increase in cases (e.g., 0.2 or 0.4) - since once the cumulative case count freezes, no further Rt will be computed.
  2. If a facility is modeled after it has already had an outbreak (so observed cumulative cases are something like 247, 247, 247, 247, 247, 247, 247...) we don't calculate Rt at all, and tell the user to input more data. In reality, we know Rt for this facility and it is 0.

Describe the solution you'd like Given both of the scenarios above, I'd suggest we set Rt = 0 if the following criteria are met:

This should address both edge cases.

Describe alternatives you've considered We could also do this only if all cases recorded so far are the same count, and it's >0, but this would only solve edge case #2 (which Zak has an example of that's come up).

Additional context Open to other suggestions or ideas. @justkunz ?

justkunz commented 4 years ago

This seems reasonable to me. I would suggest a slight modification to the second criteria: "all observed cases (staff + resident) in the past X observations are the same number" so that facilities with only 1 or 2 inputs in the last week wouldn't be eligible. X could be something between 3-7.

macfarlandian commented 4 years ago

I spent some time looking into this today and uncovered some wrinkles that suggest this may need some deeper investigation before it proceeds to implementation.

I made some changes to allow for retaining trailing duplicates as described above, expecting that when fed into the Rt model the output would settle at or close to zero. However, that doesn't happen — after a few days (it varies slightly because of the smoothing filter we apply) the R(t) value starts going up again. There may very well be a good reason for this (talking it over with @justkunz we were not sure at a glance), but at the very least it creates the possibility of some very weird results if we were to continue with the approach described above. E.g., over the course of a week of the same number of cases being recorded every day, Rt goes down, then it goes up again, then it drops to zero once our semi-arbitrary threshold is reached.

The "naive" approach proposed here would be simple enough for me to implement without relying on Justine to do the heavy modeling, but in light of what I've seen here I don't think that's a good idea, and we probably want to kick this back to data science for further R&D before proceeding.

jessex commented 4 years ago

@justkunz have you seen Ian's latest comment on this one and do you have any thoughts?

justkunz commented 4 years ago

I discussed this issue with product today and we decided to add a rolling average on top of the Rt calc output in order to handle the noise from the smaller dataset. The original algorithm assumes testing happens daily to a substantial group of people, and unfortunately that isn't true from the data we are seeing. I will also investigate the issue Ian brought up above, and I will remove the limitations on repeat cases that inspired this ticket in the first place.