epinowcast / primarycensored

Primary event censored distributions in R and Stan.
https://primarycensored.epinowcast.org/
Other
7 stars 1 forks source link

A why it works maths vignette #37

Closed seabbs closed 3 weeks ago

seabbs commented 1 month ago

We have some mathematical justification in the getting started vignette but I think we could do with more in another maths only (i.e no code) vignette. What do you think @SamuelBrand1?

SamuelBrand1 commented 1 month ago

Well I'm a fan of this idea.

SamuelBrand1 commented 1 month ago

Following on from f2f discussion with @seabbs I'm not sure https://primarycensoreddist.epinowcast.org/dev/articles/primarycensoreddist.html#compute-the-primary-event-censored-cumulative-distribution-function-cdf-for-delays-with-pprimarycensoreddist is quite correct.

We can characterise the delay distribution with primary event censoring as the sum of two random variables:

T= X + P. 

$T$ is the random secondary event time relative to the left-point of the censoring window for the censored primary time, which is the typical way to measure (other measures can be easily rewritten in this form).

Where $X \sim f_\theta$ is a delay from some parametric positive distribution, which might be right-truncated with truncation parameter $D$, and $P \sim g_r$ is the primary time distribution within the primary censoring window, which might also be parametric (e.g. tilted by exp growth rate #6 ).

Assuming independence, the cumulative distribution function for $T$ is:

F_T(t) = \mathbb{E}_{P,r} \Big[ F_{X,\theta}(t - P)\Big] = \int_0^{pwindow} F_{X,\theta}(t - p) q_r(p) dp.

Using standard probability identity and arrives at first equation of the linked section.

For the right-truncated version $X{\tau}$ of some untruncated delay random variable $X$, with truncation parameter $D$, then the CDF of $X\tau$: $F{X\tau}$ relates to the cdf $F_X$

F_{X_\tau,\theta, D}(t) = {F_{X,\theta}(t \wedge D) \over F_{X,\theta}(D)}.

The cumulative dist. identity for $T = X + P$ is not affected by the truncation, or otherwise, of $X$ therefore inserting the truncation definition gives:

F_T(t) = \int_0^{pwindow}  {F_{X,\theta}((t-p) \wedge D) \over F_{X,\theta}(D)} q_r(p) dp = { \int_0^{pwindow}  F_{X,\theta}((t-p) \wedge D) q_r(p) dp \over F_{X,\theta}(D)}.

This differs from the second equation in the linked section, which I think might be incorrect.

If we wish to right-truncate $T$, rather than $X$, at $D$ then the correct cum. dist. function is:

F_{T_\tau}(t) = {F_T(t\wedge D) \over F_T(D)} = {\int_0^{pwindow} F_{X,\theta}((t\wedge D) - p) q_r(p) dp \over \int_0^{pwindow} F_{X,\theta}(D - p) q_r(p) dp}.

Which is again different to the second equation since its the ratio of two integrals rather than the integral of a ratio.

seabbs commented 1 month ago

If we wish to right-truncate 𝑇, rather than 𝑋

This is what we want I think and what is implemented in the currently vectorised code. See the similarity to equation 21 in Park et al. 2024 (i.e double extra boring) I have also been looking at this in the numerical stability branch (changing this again makes no impact on numerical stability but does localise the issue the D). We should do some numerical testing by expanding these tests to include testing about random sampled emprical versions: https://github.com/epinowcast/primarycensoreddist/blob/2d8f339821aa3c975cfa667c8c738107d178153d/tests/testthat/test-stan-rpd-primarycensoreddist.R#L183

seabbs commented 1 month ago

Also the difference between intregrals must be very small if the analytical test here is correct: https://github.com/epinowcast/primarycensoreddist/blob/74c2c8c83ff230edbaf5643d1da7351a9714ee7e/tests/testthat/test-rpd-primarycensoreddist.R#L100

SamuelBrand1 commented 1 month ago

I think we're straying on this issue away from the maths vignette (which has a small error in the equation) towards the stability of sampling?

seabbs commented 1 month ago

Yes agree. I do think this makes clear a nice verbose maths vignette would be helpful sooner rather than later

seabbs commented 1 month ago

@SamuelBrand1 just a light bump that this would be a great addition