Add support for count data?

seabbs commented 1 month ago

Any interest in doing this and view as to the complexity? I am thinking a simple time-varying ascertainment model from the global infections time series with the more complex version support n count time series at once.

adrian-lison commented 1 month ago

I wonder if a "simple" version would do any good. If we don't account for right truncation, it would only be okay for date-of-report data. Maybe the question is also how much potential there is for merging the model used in EpiSewer --> epinowcast in the long-term...

General downside I see is that while it would mostly be a separate module, having the case data increases the complexity in terms of what signals influence the estimated infection trajectory. Not a problem per se, but can make it harder to detect and diagnose problems.

seabbs commented 1 month ago

Maybe the question is also how much potential there is for merging the model used in EpiSewer --> epinowcast in the long-term...

Well yes precisely this would be some action along those lines.

If we don't account for right truncation, it would only be okay for date-of-report data.

I agree but in the first instance can just allow for missing data support and NA partial reports I think.

having the case data increases the complexity in terms of what signals influence the estimated infection trajectory. Not a problem per se, but can make it harder to detect and diagnose problems.

Yes, agree. I think the way this is set up it makes sense to declare WW the ground truth and try and capture as much of the difference between underlying infections and the count target as ascertainment.

I don't have a direct use for this but it could be interesting as a deployable forecasting model i.e in the US flusight to compare to the increasingly mechanistic approach @kaitejohnson and co are taking.

kaitejohnson commented 1 month ago

I think for this to work for a deployable forecasting model, what you would probably need is support for n wastewater data streams rather than n count data streams (this being a maybe US biased assumption that most of the time forecasting targets are larger geographic granularities than wastewater catchment areas).

Yes, agree. I think the way this is set up it makes sense to declare WW the ground truth and try and capture as much of the difference between underlying infections and the count target as ascertainment.

Interesting, so here you would propose something where the ascertainment over time is weak enough that the R(t) time series is largely driven by the trend in wastewater. I think it would be interesting to tune that parameter based on forecast evaluation, in part because my intuition is trends in wastewater are more variable than trends in ascertainment.

I think in the first pass, it would still be really useful to be able to support one count time series, one wastewater time series, assuming the same source population.

seabbs commented 1 month ago

what you would probably need is support for n wastewater data streams rather than n count data streams

See #22

this being a maybe US biased assumption that most of the time forecasting targets are larger geographic granularities than wastewater catchment areas).

I think this is probably a bit context specific. Here for example we might have one ww site but multiple NNH hospitals all with data streams.

Interesting, so here you would propose something where the ascertainment over time is weak enough that the R(t) time series is largely driven by the trend in wastewater.

Yes as this model is really a ground truth WW model IMO.

I think in the first pass, it would still be really useful to be able to support one count time series, one wastewater time series, assuming the same source population.

I agree though I would push back on any attempt to do much with the idea of a overlapping population as I think it is better not to enforce that mechanism here.

kaitejohnson commented 1 month ago

I agree though I would push back on any attempt to do much with the idea of a overlapping population as I think it is better not to enforce that mechanism here.

Wouldn't generating count data from the infection time series enforce that assumption?

seabbs commented 1 month ago

Not if you allow it to be a subset

adrian-lison commented 1 month ago

Not if you allow it to be a subset

Yes, what is anyways needed is basic support for letting the catchment population be a subset of the infection population (currently they are assumed to be identical). The same logic would apply for case ascertainment.

adrian-lison commented 1 month ago

Interesting, so here you would propose something where the ascertainment over time is weak enough that the R(t) time series is largely driven by the trend in wastewater. I think it would be interesting to tune that parameter based on forecast evaluation, in part because my intuition is trends in wastewater are more variable than trends in ascertainment.

@kaitejohnson Interesting point. By parameter, do you mean something like putting a strong prior on the variance of observations, or something more complex like a time-varying reporting error process that serves as an "isolation layer" separating the case data from the Rt trend?

I was also wondering if there would be any value of having a stepwise procedure which first estimates Rt and the infection trajectory solely from wastewater, and then estimates ascertainment parameters from the case data and predicts future cases using a fixed infection trajectory (e.g. the median trajectory, but it is also possible to do this over several posterior samples via likelihood averaging).

kaitejohnson commented 1 month ago

I think I mean the latter: you would allow a time-varying ascertainment rate that was essentially able to vary widely day to day and had very little magnitude constraints, which would allow any infection trend to generate the observed count data... I don't think this really makes sense though because why not then just estimate it post-hoc (after estimating infection trend from ww in current model).

I was also wondering if there would be any value of having a stepwise procedure which first estimates Rt and the infection trajectory solely from wastewater, and then estimates ascertainment parameters from the case data and predicts future cases using a fixed infection trajectory (e.g. the median trajectory, but it is also possible to do this over several posterior samples via likelihood averaging).

Right, I think this is what would make more sense than doing something that looks like a joint inference but as programmed with the priors is not. Could predict future cases by propagating the computed variability in the ascertainment rate or something (though what you're suggesting sounds more sophisticated).

adrian-lison / EpiSewer

Add support for count data? #23