canmod / macpan2

Rebuilding https://github.com/mac-theobio/McMasterPandemic/
https://canmod.github.io/macpan2/
GNU General Public License v3.0
2 stars 0 forks source link

Consider implementing the model in "Designing ecologically-optimised pneumococcal vaccines using population genomics" #136

Closed stevencarlislewalker closed 5 months ago

stevencarlislewalker commented 8 months ago
jfree-man commented 6 months ago

This is my first attempt at summarizing what I have learned.

Terminology

Streptococcus pneumoniae (pneumococcus), can cause invasive pneumococcal disease (IPD).

carriage - "occupation of microbial species in the respiratory tract" (Coughtrie et al. 2019). https://doi.org/10.1099/jmm.0.001046

A serotype is a group of organisms in a species that share some property ($\approx 100$ serotypes for pneumococcus).

protein–polysaccharide conjugate vaccines (PCVs) target specific serotypes resulting in a removal of these pneumococcus serotypes in the carriage population, and the remaining circulating serotypes take over to maintain an approximate constant prevalence.

A serotype contains multiple genotypes/strains which describe the unique genetic structure of the organism.

Serotypes are characterized by their invasiveness - "rate at which serotype progresses from carriage to cause IPD" (Colijn et al. 2020) (invasiveness scores estimated from a meta-analysis).

The Problem

Serotypes contained in current PCVs were initially designed to target serotypes in the pre-vaccine carriage population, with little consideration about their effect on the serotype replacement process in the post-vaccine population. This replacement process can allow for serotypes (not included in the vaccine) that have high invasiveness and/or resistance to antibiotics to become more prevalent. The goal is to design better PCVs that incorporate invasiveness and antibiotic resistance information.

Model

A multi-locus-negative, frequency dependent selection (NFDS) is used, implemented as a deterministic first order ODE (error term $\varepsilon$ TBD), to describe the dynamics in the pneumococcal carriage population in response to different vaccine strategies. "The simulated dynamics are initially driven by vaccination perturbing the population through imposing a fitness cost on those serotypes included in the proposed formulation, followed by a return to an equilibrium under NFDS" (Colijn et al. 2020).

Interpretation as a Compartmental Model

$N$ = number of pneumococci (organisms) in a host carriage population at time $t$. Based on a few quick experiments, $N$ seems to stay close to the carrying capacity $\kappa = 10^5$.

states: $\text{state}_i$ - prevalence of genotype $i$ in the population. Total number of states $\approx 600$

flow rates: This is still fuzzy to me. Without digging in further at the moment, I understand the prevalence of genotypes change with time based on its fitness (computed at each time step). This fitness function takes on input the serotypes invasiveness.

Optimization

They use 3 different continuous objective functions (representing the vaccine design strategy) with some constraints (ex. restrictions on the number of serotypes included in the vaccine being designed (valency)). Optimization is performed using bayesopt and in some cases ga in Matlab.

jfree-man commented 5 months ago

Quick summary, we were able to implement the model in Colijn et al. in macpan2 using the TMB engine and it is included in the model library.

mp_tmb_library("starter_models", "nfds", package = "macpan2")

Initial concerns about the wall-clock time to create a simulator object were alleviated by removing standard matrix multiplication formulas, and replacing them with the TMB engine function group_sums evaluated at matrix indices. This removed the need to pass a large (603 by 1090) matrix and instead two column vectors of integers indices were all that was required.

The model implementation was validated with simulated data from the Colijn et al. model implementation in Matlab (see here). Validation results reveal some differences between the two, but I suspect they are not biologically significant.

No optimization was performed in the macpan2 implementation, as the discrete variable being optimized is not well suited for the TMB engine. A possible future direction for optimization could be rBayesianOptimization.