Add estimates for a complete time series in each country/subnational region

epiforecasts / covid-rt-estimates

National and subnational estimates of the time-varying reproduction number for Covid-19

https://epiforecasts.io/covid/

MIT License

34 stars 17 forks source link

Add estimates for a complete time series in each country/subnational region #15

Open seabbs opened 4 years ago

seabbs commented 4 years ago

Due to computational and storage constraints, it is not feasible to run the complete time series every day with our current resources. For this reason, we have shifted our daily updates to focus on a rolling window of the last 3 months of data. Many users may be interested in the complete time series - please respond to this issue in order for us to assess the priority of this requirement.

We are considering two solutions to this issue:

Piecing together daily estimates from rolling model fits
Running less frequent complete time series runs and linking these to our real-time estimates.

pearsonca commented 4 years ago

I'm one of the full time series folks.

I'm using this for global projection models on a by-country basis. We estimate an intervention time, a pre-intervention R (which is way back in basically every time series) and a post-intervention R (again, fairly early in most countries time series) and use those two values as references to calibrate the models.

In terms of the storage challenges, I only need a few aggregate R values, though the intervention break points aren't exactly generalizable. Same goes for compute - presumably updates with the most recent cases have little impact on the R estimates so far in the past.

seabbs commented 4 years ago

Hi Carl,

Thanks for pinging this. For your specific use case it sounds like its worth generating your own Rt values rather than piggybacking off this repo (as discussed on Friday). I've knocked up a gist here of my thinking around how that can be nicely done: https://gist.github.com/seabbs/07035fa4019a0c0117e28f9037593f28

joeHickson commented 4 years ago

I think we can look to do this post- new infrastructure. It will need a task to allow less frequent running of some datasets (as I don't think we would want the full history daily) but we could add an infrequent job that does this - clone all the existing datasets and run them with a wider window. This would also be blocked by #102

joeHickson commented 4 years ago

I have raised a task in our private infra repo to switch control over what runs and it's schedule into there so that we can add as many datasets to this repo as people can imagine and maintain, and LSHTM / Met Office maintain control over what is run when on our infrastructure and the associated compute cost (epiforecasts/covid-rt-estimates-infra#9 - for internal reference).

MrinankSharma commented 4 years ago

Hi all,

I would also be interested in Rt estimates for historical periods (ideally the full time-series) - are these available somewhere at the moment, or is it recommended that I run this locally?

Cheers

joeHickson commented 4 years ago

Hi @MrinankSharma - latest is still as per #103 - we are a few tasks away from being able to have the compute space to run them centrally and if you want to run it locally you will need to change the rolling window function (one of the tasks that needs changing - #102 )

sbfnk commented 3 years ago

Can we revisit this now that we're running in batch mode? Seems like steps required would be:

duplicating everything in R/lists/datasets.r with a different name, pointing to a different directory and using a different # of weeks looking in the past (e.g. the number of weeks since 1 Jan 2020)-
adding them to the batch list as running e.g. weekly

joeHickson commented 3 years ago

Suggested frequency is monthly

sbfnk commented 3 years ago

Reopening as still work in progress.