epiforecasts / EpiNow

Estimate Realtime Case Counts and Time-varying Epidemiological Parameters
https://epiforecasts.io/EpiNow
Other
33 stars 17 forks source link

Does not account for heterogeneity #54

Closed johnurbanik closed 4 years ago

johnurbanik commented 4 years ago

Great work guys! Very excited to see more bayesian modeling in this space.

That being said:

If we have two (or more) mostly isolated subpopulations, and one population has R(t) < 1 but the other has a smaller current case load but R(t) > 1, the aggregate population will temporarily seem to be 'decreasing', but the aggregate R(t) may be > 1 in reality.

This effect can be seen here: https://github.com/understand-covid/epimodelingtoolkit/blob/master/communication/growth_rate_dips.ipynb

In order to capture these types of behaviors, you would need to treat the model as a mixture model or have a good hyperprior. Without more informative priors or substantially more data, this probably introduces too many degrees of freedom to make useful nowcasts. One potentially expressive option would be to relax to just a mixture of strictly two gammas, one with a fixed mean at R0 and add a compound distribution that sorts people into the two distributions. Also worth noting that the fixed R0 if you take this approach should likely be higher; recent estimates put R0 at at least 5.7.

A secondary effect of this is that you should likely increase the thresholds for https://github.com/epiforecasts/EpiNow/blob/461bc1ce7a210a18fa69527f2bdad599118a0bcf/R/map_prob_change.R or to factor in some of the temporal data (i.e. how long has the reproduction number estimate been below 1) into the categorization.

jhellewell14 commented 4 years ago

Hi John,

You are correct that there can be different outbreak dynamics among sub-populations that might be masked by the time-varying reproduction number for the whole population. However as you mention in the issue, what prevents us from doing anything meaningful about this is data availability. Lots of places don't even have good quality, publicly available, rapidly updated case data at the national level, let alone for given sub-populations.

I'm closing this because, while it is an "issue" for modelling in general, I don't think there is anything we can do about it in this package right now unless the quality and availability of covid-19 data changes dramatically.

Cheers, Joel

johnurbanik commented 4 years ago

I don't think a lack of data is a good excuse for not modeling these things in a bayesian framework. I think that the appropriate way of handling this lack of data is to capture these effects in your generative model and then model them and using an uninformative prior.

I offered:

  1. A suggestion that because of the increased number of degrees of freedom, you increase the threshold to consider an area as decreasing (i.e. move likely decreasing into unsure, move decreasing into likely decreasing).

Anyone who reads that their area is decreasing may change behaviors even though they are part of a subpopulation that is still exhibiting growth; there are health economic consequences to ignore this heterogeneity

  1. An actionable way of including a small amount of heterogeneity. Namely, consider a mix between two gammas distributions, one with a fixed (high) mean and one with a non-fixed one, with another parameter for sorting between them.

I understand that you may not be interested in following these because of lack of time/resources, or because you think there is a better approach. If so, I'd at least hope that you'd consider adding a caveat to the limitations section as I suggested to @seabbs here

seabbs commented 4 years ago

Hi John,

Thanks again for your feedback and for taking the time to look through our analysis in such detail - it's appreciated.

  1. I think the issue here is that there is a lack of data on which to generate a hypothesis not on which to parameterize the model. So there is no justification for altering the present model. As you know good models need to aim for parsimony. There is, for example, no current evidence to justify a mixture of a high R0 and a low R0.

  2. Our categories are already conservative and a decreasing caseload does not mean that interventions can be safely lifted (if these are what is causing the reduction).

  3. As I assume you have seen our model can be run at different regional scales (as we have done) and on different subpopulations (provided they have limited levels of interaction as this will not be included).

  4. We provide all code as an open-source package under an MIT license. If you would like to replicate our analysis using a heterogeneous R then you are more than welcome to do so.

Sam

johnurbanik commented 4 years ago

Sam, thanks for the explanation.

  1. The evidence for the mixture of high R0 and low R0 includes (but is not limited to): a. The existence of essential workers. There isn't much empirical data here, but the difference in contact patterns seems self-evident to me. b. The body of research on network theory and the scale-free nature of real world social networks. Perhaps appealing to an expert here would be useful. Barabasi has a good chapter on network dynamics in epidemics here. c. The evidence that shows that under privileged areas are disproportionately effected in many cities.

  2. I don't personally believe that the categories are conservative (given the current forecast in Washington State and Southeast Germany, for example), but respect your view. While I hope I'm wrong, maybe changes over the next few weeks will empirically settle any debate.

  3. Interaction subpopulations is critical to model, in my opinion. Even if you restrict to just modeling each region as a population, inter-regional travel is still happening across the world (except perhaps in places like China and Taiwan). Further, subpopulations exist within each regional population. In any city alone you've got several different segment of the population who have some degree of isolation but a good amount of mixing. Social distancing intuitively causes these subpopulations to be more discrete and mixing to be less homogenous.

  4. The license is noted. I've begun doing some related modeling, and will reference your work if I end up using any of the insights from your work! Was hoping that some healthy dialectic would result in some better consensus, but I understand you may have other priorities right now 👍

seabbs commented 4 years ago

Thanks for the detailed response. Looking forward to seeing your modelling results and the impact of heterogeneous contacts.