epiverse-trace / serofoi

Estimates the Force-of-Infection of a given pathogen from population based sero-prevalence studies
https://epiverse-trace.github.io/serofoi/
Other
17 stars 4 forks source link

Add age-varying and time-varying data simulation including seroreversion #158

Closed ntorresd closed 7 months ago

ntorresd commented 8 months ago

This PR modifies get_sim_probability to allow for data simulation from age-varying FoI with seroreversion and includes seroreversion to the time-varying FoI option (@ben18785 please check if I'm computing this correctly).

This PR closes #74 (I'll move the discussion about the age-varying models to a separate issue).

codecov-commenter commented 8 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.32%. Comparing base (ff8b90e) to head (be7828b).

:exclamation: Current head be7828b differs from pull request most recent head aefa582. Consider uploading reports for the commit aefa582 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #158 +/- ## ========================================== + Coverage 69.81% 70.32% +0.50% ========================================== Files 10 10 Lines 1746 1776 +30 ========================================== + Hits 1219 1249 +30 Misses 527 527 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ekamau commented 8 months ago

@ntorresd - in this line: https://github.com/epiverse-trace/serofoi/blob/bc1902e2cda3fcc69666627eebecd2c70d22c8d9/R/seroprevalence_data.R#L145

we can just use 'age' instead of 'exposure_age' which might imply the age at which you got exposed to the infection ...

ekamau commented 8 months ago

On this: https://github.com/epiverse-trace/serofoi/blob/bc1902e2cda3fcc69666627eebecd2c70d22c8d9/R/seroprevalence_data.R#L224

@ntorresd - I could simulate data with seroreversion rate being > 0, i.e., assuming there's seroreversion. Have you been unable to do this?

The solution you have in the code works both when mu = 0 and when mu != 0

ekamau commented 8 months ago

@ntorresd, @ben18785 - for this probability function: https://github.com/epiverse-trace/serofoi/blob/bc1902e2cda3fcc69666627eebecd2c70d22c8d9/R/seroprevalence_data.R#L151

you can check or test it against the solution that assumes constant FOI with seroreversion:

constant_FOI_solution <- function(age, foi, mu) {
  #foi * (1 - exp(-a * (foi + mu))) / (foi + mu) ## this line works the same as below ..
  (foi/(foi + mu)) * (1 - exp(-a * (foi + mu))) 
}

I have checked, and the constant FOI function works with and w/out seroreversion, i.e., mu == 0 and mu != 0, and gives the same output/simulations as the age FOI function when foi is a single fixed value ...

let me know if this makes sense - assuming that's what we were discussing ...

ekamau commented 8 months ago

Adding the R code I was using to test the function(s) - might be helpful:

library(tidyverse)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = 4)

simulate_foi_age_exact <- function(age, fois, chunks, mu) {
  I <- 0
  # solves ODE exactly within pieces
  for(i in 1:age) {
    lambda <- fois[chunks[i]]
    I <- (1 / (lambda + mu)) * (lambda + exp(-(lambda + mu)) * (I * (lambda + mu) - lambda))
  }
  I
}

simulate_foi_age_exact2 <- function(age, fois, chunks, mu) {
  I <- 0
  # solves ODE exactly within pieces
  for(i in 1:age) {
    lambda <- fois[chunks[i]]
    I <- (1 / (lambda + mu)) * exp(-(lambda + mu)) * (lambda * (exp(lambda + mu) - 1) + I * (lambda + mu))
  }
  I
}

# the above two functions seem to work the same - give the same output!

ages <- seq(1, 40, 1)

#-----------------------------------------------------------------------------------------------
# (1) Model without seroreversion
# generate synthetic data assuming we have data for all individuals aged between 1-n
mu <- 0.0 # no seroreversion
fois <- c(0.01, 0.05, 0.1, 0.03)
chunks <- unlist(map(seq(1, 4, 1), ~rep(., 10))) # how many FOIs to estimate
prob_infected_simulated <- map_dbl(ages, ~simulate_foi_age_exact(., fois, chunks, mu))
prob_infected_simulated2 <- map_dbl(ages, ~simulate_foi_age_exact2(., fois, chunks, mu))

tibble(age=ages, R_function=prob_infected_simulated, R_function2=prob_infected_simulated2) %>%
  pivot_longer(-age) %>%
  ggplot(aes(x=age, y=value, colour=name)) +
  geom_line() + geom_point() + labs(y = "Probability infected") + guides(colour = "none") + 
  facet_wrap(.~ name)

# checking with constant FOI
constant_solution <- function(a, foi, mu) {
  #foi * (1 - exp(-a * (foi + mu))) / (foi + mu) # Ben's solution ..
  (foi/(foi + mu)) * (1 - exp(-a * (foi + mu))) # what's usually published
}

prob_infected_cst <- map_dbl(ages, ~constant_solution(., fois[1], mu))

fois <- rep(fois[1], length(ages)) # constant - for each age!

prob_infected_simulated <- map_dbl(ages, ~simulate_foi_age_exact(., fois, chunks, mu))
prob_infected_simulated2 <- map_dbl(ages, ~simulate_foi_age_exact2(., fois, chunks, mu))

tibble(age=ages, R_function=prob_infected_simulated, R_function2=prob_infected_simulated2,
       cst_function=prob_infected_cst) %>%
  pivot_longer(-age) %>%
  ggplot(aes(x=age, y=value, colour=name)) +
  geom_line() + geom_point() + labs(y = "Probability infected") + guides(colour = "none") + 
  facet_wrap(.~ name)

#-----------------------------------------------------------------------------------------------
#2) With seroreversion:
mu <- 0.1 # seroreversion
fois <- c(0.01, 0.05, 0.1, 0.03)
chunks <- unlist(map(seq(1, 4, 1), ~rep(., 10))) # how many FOIs to estimate

prob_infected_simulated <- map_dbl(ages, ~simulate_foi_age_exact(., fois, chunks, mu))
prob_infected_simulated2 <- map_dbl(ages, ~simulate_foi_age_exact2(., fois, chunks, mu))

tibble(age=ages, R_function=prob_infected_simulated, R_function2=prob_infected_simulated2) %>%
  pivot_longer(-age) %>%
  ggplot(aes(x=age, y=value, colour=name)) +
  geom_line() + geom_point() + labs(y = "Probability infected") + guides(colour = "none") + 
  facet_wrap(.~ name)

# checking with constant FOI
constant_solution <- function(a, foi, mu) {
  #foi * (1 - exp(-a * (foi + mu))) / (foi + mu) # Ben's solution ..
  (foi/(foi + mu)) * (1 - exp(-a * (foi + mu))) # what's usually published
}

prob_infected_cst <- map_dbl(ages, ~constant_solution(., fois[1], mu))

fois <- rep(fois[1], length(ages)) # constant - for each age!

prob_infected_simulated <- map_dbl(ages, ~simulate_foi_age_exact(., fois, chunks, mu))
prob_infected_simulated2 <- map_dbl(ages, ~simulate_foi_age_exact2(., fois, chunks, mu))

tibble(age=ages, R_function=prob_infected_simulated, R_function2=prob_infected_simulated2,
       cst_function=prob_infected_cst) %>%
  pivot_longer(-age) %>%
  ggplot(aes(x=age, y=value, colour=name)) +
  geom_line() + geom_point() + labs(y = "Probability infected") + guides(colour = "none") + 
  facet_wrap(.~ name)
ntorresd commented 8 months ago

@ntorresd, @ben18785 - for this probability function:

https://github.com/epiverse-trace/serofoi/blob/bc1902e2cda3fcc69666627eebecd2c70d22c8d9/R/seroprevalence_data.R#L151

you can check or test it against the solution that assumes constant FOI with seroreversion:

constant_FOI_solution <- function(age, foi, mu) {
  #foi * (1 - exp(-a * (foi + mu))) / (foi + mu) ## this line works the same as below ..
  (foi/(foi + mu)) * (1 - exp(-a * (foi + mu))) 
}

I have checked, and the constant FOI function works with and w/out seroreversion, i.e., mu == 0 and mu != 0, and gives the same output/simulations as the age FOI function when foi is a single fixed value ...

let me know if this makes sense - assuming that's what we were discussing ...

You're right, the functions work pretty well!

I added a valued based test based on your R code for large sample sizes. I also corrected the way we were testing all the data simulation functions, so I recommend you take a look at https://github.com/epiverse-trace/serofoi/pull/158/commits/5c72282c26de2397c5359dc43ad790decfd738e8 @ben18785 and @ekamau.

Right now I'm choosing the tolerance to compare the simulated seroprevalence values with the exact probabilities in a rather arbitrary way... I tried doing something similar to what I did for the modelling module (see https://github.com/epiverse-trace/serofoi/pull/153/commits/22244db2e6c5b7873112f33aab6bd3a2997f1c76), where I use as a tolerance half the size of the credible interval; I tried to use here the binomial confidence interval as a criteria, but weirdly enough the tests always passed for arbitrary sample size when I do it (which is nice, but is not very useful for testing), so I chose a value (tolerance = 10^-3) that gives good results for a large sample size (10^7) and fails for small sample size (~100).

ekamau commented 8 months ago

@ntorresd - thanks for making the changes. it looks good; no further comments at this stage.