Open kaitejohnson opened 1 month ago
Could then consider eventually excluding the lab-sites with very low correlations historically in the fit used to make the forecast, since this is likely going to degrade the signal rather than improve it.
I think this might the easiest approach if the goal is optimizing the forecast skill, which is implicitly the approach at the moment.
However, its worth considering other options where we are generative for the whole dataset e.g. having a bias parameter etc
Thoughts @dylanhmorris ?
Btw, it might turn out that different sites:lab combinations have different lag times. We should watch out for that.
Yeah I would think it makes the most sense to apply some filter on the same lag since the model can only learn one lag?
Hey @kaitejohnson !
Is this analysis blocked by anything (other than lots of other jobs?), I might be able to help?
It isn't blocked no I just haven't had a chance to do it! All I did so far was compute them all and make a bunch of figures which I am hoping we can discuss at today's meeting!
Goal
We currently have plots of the daily or weekly growth rate in the admissions and ww data, with the goal being to identify wastewater sites where there is a very poor mapping/correlation between trends in wastewater concentration and trends in hospital admissions. This is just applied to the raw data with no lags, and we are for now just visualizing them. We could additionally: 1.) Compute an R-squared for the correlation between the growth rates for each lab-site and the hospital admissions 2.) Compute an R-squared for the correlation with various lag shifts in the data.
Could then consider eventually excluding the lab-sites with very low correlations historically in the fit used to make the forecast, since this is likely going to degrade the signal rather than improve it.
See https://github.com/CDCgov/wastewater-informed-covid-forecasting/pull/167#issuecomment-2368741471 for context