Open scarpino opened 1 month ago
@scarpino Thanks for flagging this, I think this could be a really nice feature for a next release.
Because the IHR would be unidentifiable without any admissions data, would the idea be that the user could specify an IHR distribution (or if not, the forecast could be generated by defaulting to the prior we had set from the literature estimate of the IHR)?
We have already flagged the problem you pose around not having the hospital admissions at the geographic granularity of the wastewater data, and that it would still be useful to generate expected hospital admissions in that catchment area (which would be a very straightforward addition to the current implementation, especially if a global hospital admissions dataset is available) https://github.com/CDCgov/ww-inference-model/issues/185
@kaitejohnson My initial idea was just to require an IHR as an assumed input. It could be really interesting to have a hierarchical model where some inference of the IHR could happen with coarser-grained data in space/time that then informs a finer-grained forecast. That second approach feels like a much more significant feature in terms of effort.
I think it would be a significant lift to allow any "structure" of the coarser-grained count data to be passed in (e.g. a user specified subset of the catchment areas).
However, as currently written, the model requires count data from the "global" population, and generates the expected total counts by summing over latent subpopulation level incident infections. So essentially, the finer scale infections we already "get for free" so to speak in the current mode structure (would just need to convolve with delay distribution and scale by population size to get the fine grain count forecasts).
Removing the need for the global count data also shouldn't be a huge lift, just will require some design choices to allow both ww and count data to be turned on and off, and allow the user to supply an IHR in this case.
@scarpino Ok thinking about this a bit more, one thing I am concerned about is that there are two non-uniquely identifiable parameters here. One is the infection: hospital admissions rate, the other is the infection: genomes shed per infection.
This makes the latent incident infections in the model relatively unbounded, e.g. you could have lots of combinations of $I(t)$, $p_{hosp}$ and $G$ that can produce the observed hospital admissions and wastewater concentrations....
So even if the user has a close to accurate IHR, the estimate of the genomes shed per infection isn't constrained at all (and neither is $I(t)$ ).... Asking the user to specify this seems a bit trickier since I think it's not super reliably estimated in the literature....
My sense is that for now we should stick with something that at least fits to count data (could be cases) to constrain these parameters a bit and ensure that any forecasted observed quantities are grounded in real observations...
I wasn't thinking about both IHR and I(t) being unobserved, which explains why the model can work in the other direction hosp -> wastewater. Maybe worth a larger, hackathon level effort in the future.
Is your feature request related to a problem? Please describe. Many wastewater surveillance groups do not have access to hospitalization data and/or access to hospitalization data at the same spatiotemporal resolution as the wastewater data, it would be useful to have the ability to forecast hospitalizations using only wastewater data.
Describe the solution you'd like Given a timeseries of wastewater data matching the formatting requirements of this package and assumed hospitalization rate users could generate a forecasted hospitalization timeseries.
Describe alternatives you've considered Inclusion of wastewater data has become more established in existing forecasting approaches for influenza, but I am not aware of a solution similar to the one that would result from this feature.
Additional context None.