CDCgov / multisignal-epi-inference

Python package for statistical inference and forecast of epi models using multiple signals
https://cdcgov.github.io/multisignal-epi-inference/
10 stars 1 forks source link

Exponential growth model for initial infections #116

Closed gvegayon closed 3 weeks ago

gvegayon commented 1 month ago

Goal

Implement the initialization of the infections using an exponential growth model as in the wastewater model (see here):

This process is initialized by estimating an initial exponential growth[^EpiNow2] of infections for 50 days prior to the calibration start time $t_0$:

$$ I(t) = I_0\exp(rt) $$

where $I_0$ is the initial per capita infection incident infections and $r$ is the exponential growth rate.

Context

During discussions of feature prioritization. This particular component was not listed.

Required features

The feature should be implemented as a RandomVariable object part of the latent submodule.

Specifications

These specs may be moved to a different issue involving other components of the replication of the hospital admissions model in the wastewater project.

Out of scope

gvegayon commented 1 month ago

Hey @damonbayer, as we discussed, this could be implemeted via a latent process either extending the current implementation of the infections module or adding a new one called something like latent.InfectionsWithExpGrowth(RandomVariable).

damonbayer commented 1 month ago

From my understanding, there are three key times related to this implementation:

There is no reason that $t\text{obs}$ couldn't come before $t\text{seeded}$, but in the current hospitalization model tutorial (example-with-datasets), we have $t0 < t\text{seeded} < t_\text{obs}$.

This issue relates to generating $I(t)$ for $t \in [t0, t{\text{seeded}}]$. In the current model, we have $I(t) = I_0$ for $t \in [t0, t{\text{seeded}}]$ and we only save $I(t)$ for $t \geq t_{\text{seeded}}$.

When we have variation in $I(t)$ for $t \in [t0, t{\text{seeded}}]$, I assume we will want to keep track of all of these values. Is that correct?

@gvegayon @dylanhmorris

gvegayon commented 1 month ago

When we have variation in [math] for [math], I assume we will want to keep track of all of these values. Is that correct?

Yes. Since the only sampled quantities in the equation are $I_0$ and $r$, the rest of the sequence is fully deterministic. You can use the numpyro.deterministic function to tag that and ask the model to keep track of it, e.g.,

https://github.com/CDCgov/multisignal-epi-inference/blob/a7d6d2d6cf453cfbfbb89c0389468e7d120bde8a/model/src/pyrenew/process/rtrandomwalk.py#L123

That said, I am not sure I'm following why we need $t_{seeded}$ and not only $t0$ and $t{obs}$. What am I missing?

damonbayer commented 1 month ago

$t\text{seeded} \neq t\text{obs}$ corresponds to the idea of padding in the current hospitalization model tutorial (example-with-datasets).

In that model, we have $I(t) = I_0$ for $t \in [t0, t{\text{seeded}}]$. Then $I(t)$ follows the renewal process for $t > t\text{seeded}$. The observations don't start until some time later, $t\text{obs} = t_\text{seeded} +$ padding

You could also imagine a scenario where $t\text{obs} < t\text{seeded}$ which would be like negative padding. This points to the idea that the $I(t)$ vector should contain $I(t)$ for $t > t0$. In the current implementation it only contains $I(t)$ for $t > t\text{seeded}$.

damonbayer commented 1 month ago

So, I am thinking latent/i0.py should be made more generic to contain the concept of $I(t)$ for $t \in [t0, t{\text{seeded}}]$. Then that would be used with sample_infections_rt to get $I(t)$ for $t > t_0$, which forms the basis for the observations.

dylanhmorris commented 1 month ago

I agree with @damonbayer; my only minor quibble is notational. Proposed default: let $t_0 = 0$ denote the first timepoint at which $I(t)$ is generated by the renewal process according to some $\mathcal{R}(t_0)$ (versus "seeded").

i.o.w. seeding times are by default negative: $t_\mathrm{firstseed} < t_0 = 0$