Open seabbs opened 4 months ago
@SamuelBrand1 I am going to have a go at this soon. Any thoughts on naming?
I'm leaning towards EpiDistributions
or possibly EpiAwareDistributions
although I think that is maybe too anchored.
My reasoning is that what we are fundamentally doing in the three examples you quote is operating on distributions to create (in some sense) good discrete distributions for epidemiological modelling. Either because of enforcing discrete time dynamics or reasoning on observation censoring (or both).
I think more verbose and making an implicit link to Distributions.jl
works in typical Julia module naming conventions.
So I am expecting this to also hold all the nonparametric hazard components that epinowcast deals with so for that reason I think it might make sense to use a more general name?
That includes things that are discretization of things like Survival.jl
which is from the JuliaStats
org (as is Distributions.jl
)... Maybe EpiStats
?
More left field, the non-hazard models are really special cases of a hazard model so EpiHazards.jl
?
I feel like EpiHazard might lose people (and not everything would need a hazard approach). Maybe its two interacting submodels that have the same types (and so can be composed together...)?
I feel like EpiHazard might lose people (and not everything would need a hazard approach). Maybe its two interacting submodels that have the same types (and so can be composed together...)?
E.g. something like EpiDists
and EpiSurvival
?
yes but they would both need to have the same abstract parent type so that they hook in to i.e latent delay in the same way. I think probably it would be EpiHazard though as EpiSurvival would include dists (i.e distributional survival modelling)
Some proposed models (note these are really pseudo code) to go in this module in some form.
@model function censoring_delay_model(no_at_delay)
# Parameters
meanlog ~ Normal(0, 10)
sdlog ~ truncated(Normal(0, 10), 0, Inf)
. pmf = censored_pmf(LogNormal(meanlog, sdlog); D = length(no_at_delay))
. for i in eachindex(pmf)
. if no_at_delay[i] != 0
@addlogprob! no_at_delay[i] * log(pmf[i])
end
. end
end
@model function censored_truncated_delay_model(no_at_delay_by_obs_time)
# Parameters
meanlog ~ Normal(0, 10)
sdlog ~ truncated(Normal(0, 10), 0, Inf)
. pmfs = map(no_at_delay_by_obs_time) do i
censored_pmf(LogNormal(meanlog, sdlog); D = i.obs_time)
. end
.
. for i in eachindex(pmfs)
pmf = pmfs[i]
. no_at_delay = no_at_delay_by_obs_time[i].no_at_delay
. for j in eachindex(pmf)
. if no_at_delay[j] != 0
@addlogprob! no_at_delay[j] * log(pmf[j])
end
. end
end
end
These both make use of our censored_pmf
function (https://github.com/CDCgov/Rt-without-renewal/blob/main/EpiAware/src/EpiAwareUtils/censored_pmf.jl) which uses numerical integration to integrate over the first interval and creat a primary event censoring adjusted cdf and then uses differences of this cdf to create a double interval censored pmf.
These implementations are not generative but using switches could be made to be fairly easily
There are more efficient ways you could code this but as its a cohort approach it scales by D and unique obs time versus the number of observations and so should perform much better in high data settings. The daily event window limitation is really just a software engineering limitation (you need a higher dimensional grid). This is also currently hard coded to assume a uniform primary event window (the secondary event dist doesn't matter) but this could also be changed again with engineering vs any methodological work.
For interest @natemcintosh, @kgostic. I don't think there is anything limiting us from putting similar approaches in epinowcast/epidist
fairly soon once we have the package infra locked down. if the code isn't very helpful can write some maths down to summarise this.
In stan land this is some useful context for what is going on: https://mc-stan.org/docs/stan-users-guide/efficiency-tuning.html#bernoulli-sufficient-statistics
Ideally we would do some evaluation for this model and I think this could be done by not me in a way that would be very useful from several angles.
The daily event window limitation is really just a software engineering limitation (you need a higher dimensional grid).
Just to add censored_pmf
does already have this, and it by default (if no D
variable is provided the length of the discretised distribution is chosen to be the smallest where 99% of the continuous distribution mass is covered.)
The issue with this approach is you need to approximate obs_time
(which is really a combination of some known time and the p event time using the lower bound of the p event time.
This means that for longer p event windows you will start to have small amounts of truncation bias. The solution is to include the obs time in the integral I think which would be worth exploring once this has been implemented but may also not really turn out to be needed.
Just to add censored_pmf does already have this, and it by default (if no D variable is provided the length of the discretised distribution is chosen to be the smallest where 99% of the continuous distribution mass is covered.)
The issue I was referring to here is the lookup between your pmfs and the model. You could also encode this so that the pmf is calculated for each independent d cohort and then this wouldn't be a problem but you would have a slightly less efficient model as you could be solving each cdf up to twice (as two pmfs might share the same bound (i.e lower for one and upper for the other).
Noting this package https://github.com/epinowcast/primarycensoreddist
We currently use delays in several places and could be doing so via a submodule. This could be called several things.
I see it being used to:
To start (i.e to resolve this issue) we would need to move our current Fixed delay functionality into a submodule. We would then need to make issues to implement our
censoring_pmf
approach as a in model fitted delay and to move our generation time model to be a model input.