This package provides tooling to forecast the time-varying reproduction
number and use this to forecast reported case counts via a branching
process. It supports a range of time series modelling packages including
bsts
, forecast
, and fable
. It also supports ensembles via stackr
and forecastHyrbid
. Forecasts can be assessed by iteractively fitting
and then using proper scoring rules (via scoringutils
and
scoringRules
) to compare to both observed case counts and estimated
reproduction numbers.
Install the stable development version of the package with:
install.packages("EpiSoon", repos = "https://epiforecasts.r-universe.dev")
Install the unstable development version of the package with (few users should need to do this):
remotes::install_github("epiforecasts/EpiSoon")
bsts
and fable
for models, ggplot2
for
plotting, and cowplot
for theming)library(EpiSoon)
library(bsts)
library(fable)
library(future)
library(cowplot)
library(dplyr)
EpiSoon::example_obs_rts
and
EpiSoon::example_obs_cases
as starting data sets). When generating
timeseries with EpiNow
use get_timeseries
to extract the
required data.obs_rts <- EpiSoon::example_obs_rts %>%
dplyr::mutate(timeseries = "Region 1") %>%
dplyr::bind_rows(EpiSoon::example_obs_rts %>%
dplyr::mutate(timeseries = "Region 2"))
obs_cases <- EpiSoon::example_obs_cases %>%
dplyr::mutate(timeseries = "Region 1") %>%
dplyr::bind_rows(EpiSoon::example_obs_cases %>%
dplyr::mutate(timeseries = "Region 2"))
models <- list(
"AR 3" =
function(...) {
EpiSoon::bsts_model(
model =
function(ss, y) {
bsts::AddAr(ss, y = y, lags = 3)
}, ...
)
},
"Semi-local linear trend" =
function(...) {
EpiSoon::bsts_model(
model =
function(ss, y) {
bsts::AddSemilocalLinearTrend(ss, y = y)
}, ...
)
},
"ARIMA" =
function(...) {
EpiSoon::fable_model(model = fable::ARIMA(y ~ time), ...)
}
)
future::plan
to do
this in parallel).future::plan("sequential")
## Compare models
forecasts <- EpiSoon::compare_timeseries(obs_rts, obs_cases, models,
horizon = 7, samples = 10,
serial_interval = EpiSoon::example_serial_interval
)
#> Warning: There were 40 warnings in `dplyr::mutate()`.
#> The first warning was:
#> ℹ In argument: `eval = furrr::future_pmap(...)`.
#> Caused by warning:
#> ! Unknown or uninitialised column: `.distribution`.
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 39 remaining warnings.
forecasts
#> $forecast_rts
#> # A tibble: 511 × 12
#> timeseries model foreca…¹ date horizon median mean sd bottom lower
#> <chr> <chr> <chr> <date> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Region 1 AR 3 2020-03… 2020-03-05 1 2.26 2.25 0.0504 2.17 2.25
#> 2 Region 1 AR 3 2020-03… 2020-03-06 2 2.21 2.19 0.0729 2.09 2.19
#> 3 Region 1 AR 3 2020-03… 2020-03-07 3 2.15 2.12 0.0957 1.95 2.13
#> 4 Region 1 AR 3 2020-03… 2020-03-08 4 2.07 2.05 0.128 1.86 2.06
#> 5 Region 1 AR 3 2020-03… 2020-03-09 5 2.00 1.98 0.165 1.69 2.00
#> 6 Region 1 AR 3 2020-03… 2020-03-10 6 1.95 1.93 0.173 1.61 1.93
#> 7 Region 1 AR 3 2020-03… 2020-03-11 7 1.90 1.86 0.199 1.52 1.86
#> 8 Region 1 AR 3 2020-03… 2020-03-07 1 2.11 2.11 0.0389 2.02 2.10
#> 9 Region 1 AR 3 2020-03… 2020-03-08 2 2.04 2.03 0.0449 1.94 2.02
#> 10 Region 1 AR 3 2020-03… 2020-03-09 3 1.97 1.94 0.0713 1.80 1.96
#> # … with 501 more rows, 2 more variables: upper <dbl>, top <dbl>, and
#> # abbreviated variable name ¹forecast_date
#>
#> $rt_scores
#> # A tibble: 399 × 12
#> timeseries model forec…¹ date horizon mad bias dss crps log_s…²
#> <chr> <chr> <chr> <date> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Region 1 AR 3 2020-0… 2020-03-05 1 0.0514 -0.2 -5.98 0.0110 -1.96
#> 2 Region 1 AR 3 2020-0… 2020-03-06 2 0.0630 0.4 -5.34 0.0176 -1.55
#> 3 Region 1 AR 3 2020-0… 2020-03-07 3 0.0879 0 -4.72 0.0232 -1.31
#> 4 Region 1 AR 3 2020-0… 2020-03-08 4 0.129 -0.2 -4.01 0.0345 -0.961
#> 5 Region 1 AR 3 2020-0… 2020-03-09 5 0.165 -0.4 -3.44 0.0514 -0.764
#> 6 Region 1 AR 3 2020-0… 2020-03-10 6 0.170 -0.4 -3.25 0.0578 -0.713
#> 7 Region 1 AR 3 2020-0… 2020-03-11 7 0.186 -0.6 -2.67 0.0902 -0.483
#> 8 Region 1 AR 3 2020-0… 2020-03-07 1 0.0283 -0.8 -5.53 0.0207 -2.05
#> 9 Region 1 AR 3 2020-0… 2020-03-08 2 0.0273 -1 -3.32 0.0506 -0.712
#> 10 Region 1 AR 3 2020-0… 2020-03-09 3 0.0398 -1 -2.13 0.0864 -0.162
#> # … with 389 more rows, 2 more variables: ae_median <dbl>, se_mean <dbl>, and
#> # abbreviated variable names ¹forecast_date, ²log_score
#>
#> $forecast_cases
#> # A tibble: 399 × 12
#> timeseries model forecas…¹ date horizon median mean sd bottom lower
#> <chr> <chr> <chr> <date> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Region 1 AR 3 2020-03-… 2020-03-05 1 68.5 67.5 5.91 59 61
#> 2 Region 1 AR 3 2020-03-… 2020-03-06 2 81.5 80.5 11.8 61 72
#> 3 Region 1 AR 3 2020-03-… 2020-03-07 3 97 94.4 15.7 68 94
#> 4 Region 1 AR 3 2020-03-… 2020-03-08 4 114 116. 13.9 102 102
#> 5 Region 1 AR 3 2020-03-… 2020-03-09 5 139 131. 24.0 98 132
#> 6 Region 1 AR 3 2020-03-… 2020-03-10 6 150. 156. 23.9 117 141
#> 7 Region 1 AR 3 2020-03-… 2020-03-11 7 186. 182. 41.1 122 155
#> 8 Region 1 AR 3 2020-03-… 2020-03-07 1 94.5 92.7 8.30 80 94
#> 9 Region 1 AR 3 2020-03-… 2020-03-08 2 108 107. 6.34 95 104
#> 10 Region 1 AR 3 2020-03-… 2020-03-09 3 122. 121. 13.8 94 115
#> # … with 389 more rows, 2 more variables: upper <dbl>, top <dbl>, and
#> # abbreviated variable name ¹forecast_date
#>
#> $case_scores
#> # A tibble: 399 × 12
#> timeseries model sample forecast…¹ date horizon mad bias dss crps
#> <chr> <chr> <chr> <chr> <date> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 Region 1 AR 3 1 2020-03-04 2020-03-05 1 6.67 0.4 4.09 2.75
#> 2 Region 1 AR 3 1 2020-03-04 2020-03-06 2 11.1 0.4 5.28 4.77
#> 3 Region 1 AR 3 1 2020-03-04 2020-03-07 3 13.3 0.4 5.58 6.3
#> 4 Region 1 AR 3 1 2020-03-04 2020-03-08 4 14.1 0.6 6.30 6.85
#> 5 Region 1 AR 3 1 2020-03-04 2020-03-09 5 29.7 0.2 6.70 11.1
#> 6 Region 1 AR 3 1 2020-03-04 2020-03-10 6 14.1 0.6 6.68 7.7
#> 7 Region 1 AR 3 1 2020-03-04 2020-03-11 7 42.3 0.2 7.47 13.1
#> 8 Region 1 AR 3 1 2020-03-06 2020-03-07 1 8.90 0.4 4.48 3.65
#> 9 Region 1 AR 3 1 2020-03-06 2020-03-08 2 5.19 0.6 4.23 3.68
#> 10 Region 1 AR 3 1 2020-03-06 2020-03-09 3 13.3 0.2 5.26 4.1
#> # … with 389 more rows, 2 more variables: ae_median <dbl>, se_mean <dbl>, and
#> # abbreviated variable name ¹forecast_date
EpiSoon::plot_forecast_evaluation(forecasts$forecast_rts, obs_rts, c(7)) +
ggplot2::facet_grid(model ~ timeseries) +
cowplot::panel_border()
EpiSoon::plot_forecast_evaluation(forecasts$forecast_cases, obs_cases, c(7)) +
ggplot2::facet_grid(model ~ timeseries, scales = "free") +
cowplot::panel_border()
EpiSoon::summarise_scores(forecasts$case_scores)
#> # A tibble: 12 × 9
#> score model bottom lower median mean upper top sd
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 ae_median AR 3 0.85 9 24.5 5.69e+1 91 2.27e2 6.28e+1
#> 2 ae_median Semi-local linea… 0.613 8.5 24.2 5.11e+1 92.4 1.88e2 5.86e+1
#> 3 bias AR 3 -1 0 0.6 4.47e-1 1 1 e0 6.13e-1
#> 4 bias Semi-local linea… -1 0.2 0.6 4.81e-1 1 1 e0 5.90e-1
#> 5 crps AR 3 2.94 6.57 17.8 4.69e+1 79.3 2.00e2 5.59e+1
#> 6 crps Semi-local linea… 2.80 6.82 15.6 4.18e+1 69.5 1.60e2 5.12e+1
#> 7 dss AR 3 4.53 6.09 8.33 1.29e+1 15.9 4.67e1 1.09e+1
#> 8 dss Semi-local linea… 4.56 6.20 7.74 1.28e+1 12.7 5.07e1 1.35e+1
#> 9 mad AR 3 7.93 13.3 21.5 2.61e+1 34.1 8.51e1 1.76e+1
#> 10 mad Semi-local linea… 6.10 12.8 19.3 2.46e+1 31.1 6.58e1 1.70e+1
#> 11 se_mean AR 3 1.15 81 784 7.29e+3 8855. 4.72e4 1.56e+4
#> 12 se_mean Semi-local linea… 0.524 79.2 595. 6.26e+3 8663. 4.16e4 1.31e+4
File an issue here if you have identified an issue with the package. Please note that due to operational constraints priority will be given to users informing government policy or offering methodological insights. We welcome all contributions, in particular those that improve the approach or the robustness of the code base.