Possible scenario comparison functionality

pratikunterwegs commented 6 months ago

This issue logs discussions and prototypes for proposed scenario comparison functionality. Please feel free to add to the conversation below.

Scenario comparison requirements

User-friendly; should be possible to compare scenarios in 1 - 2 functions and < 10 lines of code;
Inputs and outputs should be inter-operable with non-{epidemics} outputs; may have better functionality for {epidemics} outputs (?);
Should be make like-for-like comparisons (matching parameter sets and replicates across scenarios);
Flexibly select outcomes: recoveries or cases, and deaths where available (?);
Return raw differences or a summary of differences.

Discussion points

Should there be a way to prevent outputs from different model structures from being compared?
How should scenarios be identified? e.g. as a simple numeric identifier based on position in a list, or is a unique identifier necessary?

A small reprex shows my thinking on this so far, please feel free to suggest changes.

library(epidemics)

population_size <- 14e6

# prepare initial conditions as proportions
initial_conditions <- c(
  S = population_size - 11, E = 10, I = 1, H = 0, F = 0, R = 0
) / population_size

# prepare a <population> object
guinea_population <- population(
  name = "Guinea",
  contact_matrix = matrix(1), # note dummy value
  demography_vector = 14e6, # 14 million, no age groups
  initial_conditions = matrix(
    initial_conditions,
    nrow = 1
  )
)

population = guinea_population

replicates = 50

# generate set of intervention + baseline scenarios to compare
intervention_set <- list(
  baseline = NULL,
  response = list(
    transmission_rate = intervention(
      "reduce_beta", "rate", 50, 100, 0.5
    )
  ),
  response2 = list(
    transmission_rate = intervention(
      "reduce_beta", "rate", 1, 100, 0.5
    )
  )
)

data = model_ebola(
  population,
  # transmission_rate = transmission_rate,
  intervention = intervention_set,
  replicates = replicates
)

outcomes_averted = function(baseline,
                            scenarios,
                            summarise = TRUE,
                            by_group = TRUE,
                            ...) {

  # collect arguments to `epidemic_size()`
  args = list(...) # not used currently

  # get epidemic size for baseline and response scenarios
  baseline_outcomes = epidemic_size(baseline, simplify = FALSE, by_group = by_group)
  scenario_outcomes = lapply(scenarios, epidemic_size, simplify = FALSE, by_group = by_group)

  # get differences between response and baseline in terms of epidemic size
  averted = vapply(scenario_outcomes, function(df) {
    df$value - baseline_outcomes$value
  }, FUN.VALUE = numeric(nrow(baseline_outcomes)))

  # return summarised values or raw differences
  if (summarise) {
    averted_summary = apply(averted, 2, function(x) {
      averted_median = mean(x)
      averted_lims = quantile(x, probs = c(0.05, 0.95))
      names(averted_lims) = sprintf("quantile_%s", c("05", "95"))
      c(median = averted_median, averted_lims)
    })

    averted_summary = as.data.frame(t(averted_summary))
    averted_summary$scenario = seq_len(nrow(averted_summary))

    # return difference summary
    averted_summary
  } else {
    averted = as.data.frame(averted)
    colnames(averted) = sprintf("scenario_%s", seq_along(averted))
    if (nrow(averted) > 1) {
      averted$replicate = seq_len(nrow(averted))
    }

    # return raw difference data
    averted
  }
}

outcomes_averted(
  baseline = data[scenario == 1]$data[[1]],
  scenarios = data[scenario != 1]$data
)
#>    median quantile_05 quantile_95 scenario
#> 1 -228.78     -436.95       47.95        1
#> 2 -359.30     -527.05     -151.70        2

^{Created on 2024-04-24 with reprex v2.0.2}

adamkucharski commented 6 months ago

Thanks for sharing. Some quick initial thoughts:

Reflecting on past projects comparing outputs, the big bottlenecks where a lot of code is wasted have been book-keeping (e.g. iterating over scenarios and keeping track of which one is which), storage and retreival (e.g. recording relevant model outputs so they can be analysed later) and trajectory matching and summarising (e.g. pulling the case incidence out of each stored object then comparing total cases averted – and in the case of a stochastic model, or simulations from a posterior with parameter uncertainty, making sure that the code is doing like-for-like comparisons of the same parameter/seed in each sample)
There may be a lot of different things a user wants to compare (too many to predict), so although outcomes_averted() is a useful starting point, making data[scenario == 1]$data[[1]] etc. much easier to handle would be useful, e.g. maybe with something intuitive like get_outcome(data,"cases",1)

pratikunterwegs commented 6 months ago

Thanks, would definitely be good to discuss this more before implementing something as it might take some restructuring of inputs etc.

(e.g. iterating over scenarios and keeping track of which one is which)

What would users expect in terms of keeping track of scenarios? The model output currently makes scenarios from the input NPI + vax combinations automatically. Is there a call for changing how that's done - e.g. passing a full intervention set (NPIs + vaccinations) as a single scenario?

making sure that the code is doing like-for-like comparisons of the same parameter/seed in each sample

A question here is whether discrete values of infection parameters, e.g. $R_0$ = 1.5, 2.0, 2.5, should count as separate scenarios, because currently they are treated similar to parameter uncertainty around single values. Might need to pass the parameters, and some way of specifying uncertainty around them, to the 'scenario' object from above.

pratikunterwegs commented 5 months ago

Partially addressed in the v0.4.0 release. Moving to Discussions page.

epiverse-trace / epidemics

Possible scenario comparison functionality #221

Scenario comparison requirements

Discussion points