Open SamuelBrand1 opened 2 weeks ago
PS I didn't have the same stats vocabulary but its the underlying modelling approach me and @MattKeeling used for temperature driven event modelling for biting midges https://royalsocietypublishing.org/doi/full/10.1098/rsif.2016.0481
The code is fairly horrible MATLAB. But the underlying computational idea is that for any given model parameters you can do one sweep of $H(t)$ relative to some start time before the data, cache this as an array, then you can get the relevant values of $H(t)$ for specific delays as differences along points of the array.
e.g.
The accumulated hazard between $t_1$ and $t_2$:
H(t_1,t_2) = \int_{t_1}^{t_2} \theta_s ds = H(0, t_2) - H(0, t_1).
My first pass take is I think expressing the primary censored distributions as disrete hazards seems very in scope here if we first do some thinking about whether or not we expect it to be more numerically efficient/stable vs going log pmf -> log inverse logit hazard (which is the scale everything in say epinowcast
and other Gunther like nowcast approaches operate on if being try hard).
I assume that this will be the case as we need the ccdf (i.e. see https://github.com/epinowcast/epinowcast/blob/12c24bf509704c4384902408f36dff86e73fef5b/inst/stan/functions/discretised_logit_hazard.stan#L195).
Non-parametric reference date effect
Non-parametric report date effect
I'm less convinced that these ideas live here and/or approaching this way provides substantial benefits vs the discrete hazard approach take in epinowcast
though open to being convinced. I think even if convinced I would push for hazard tooling that doesn't work directly on the dists to be out of scope and supported downstream or in another module.
The approach in epinowcast
documentation is a proportional hazards approach, i.e. it has a regression directly on the logit(hazard) that represent shifting the hazard by a log odds-ratio.
The infrastructure you have built in primarycensoreddist
lends itself towards a AFT approach because we have the ccdf functions directly. In terms of compute, in either approach we need to calculate the hazard/time acceleration effect sizes. In the epinowcast
approach we also need to work with the normalisation (e.g. we can't use a spare data trick to calc only parts of the log-pmf because the norm factor also depends on covariates); with AFT we don't need that.
Proportional hazards and AFT aren't equivalent, and its a bit of design decision to make (e.g. this discussion https://stats.stackexchange.com/questions/621041/choice-between-proportional-hazard-and-accelerated-failure-time-survival-models). The historical preference for proportional hazard survival analysis is, at least partly, influenced by the ability to scale out the baseline hazard rate and concentrate on the covariate effect sizes. I'm not sure thats such a relevant consideration here.
From Cox himself https://estadisticafciencias.wordpress.com/wp-content/uploads/2019/08/cox_d._r_analysis_of_survival_dataz-lib.org_.pdf Chapter 5 is useful reference.
I think in epinowcast
you can implement non-proportional hazards using the non-parametric component of the reference module but I could be wrong.
(e.g. we can't use a spare data trick to calc only parts of the log-pmf because the norm factor also depends on covariates); with AFT we don't need that.
This could potentially be a big advantage
What is this issue?
This issue is about discussing exposing accelerated failure time models in
primarycensordist
. The goal is to be as flexible as https://package.epinowcast.org/articles/model.html#generalised-model-1 Gunther et al, but to integrate directly with the work done on survival functions here.Accelerated failure time (AFT) models
The basic idea of AFT survival models is that we start with a baseline delay model with hazard function for time since primary $t$: $\lambda_0(t)$ and survival function $S_0(t)$. Then we treat effects as either accelerating (i.e. secondary event arrives relatively sooner) or decelerating (later) the time until secondary event. For fixed effects:
Fixed effect on the hazard can be extended time-varying effects (e.g. a particular time period has decelerated hazard in reporting) (cf. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10346080/pdf/kxac009.pdf ). Rewriting the linked paper in approx common notation:
Possible usage in
primarycensordist
We have access to various analytic expressions for the survival function of censored delays (e.g. $S_0$ in notation above). Therefore, given a calculation of the cumulative hazard $H(t)$, the probability of a secondary time in $[T, T+ w_S]$ in the AFT framework is $S_0(H(T)) - S_0(H(T+w_S))$, which is potentially not that hard to calculate (maybe?) and avoids having to renormalize as per the Gunther discrete time model.
The three hazard aspects consider here https://package.epinowcast.org/articles/model.html#generalised-model-1 seem to be able to be address pretty well in this framework.
This is not very different from the Gunther model, but it brings things into a common continuous time framework and potentially eliminates normalisation. Might be a bit more efficient (I'm not sure though).