chjackson / flexsurv

The flexsurv R package for flexible parametric survival and multi-state modelling
http://chjackson.github.io/flexsurv/
55 stars 27 forks source link

numerical instability in pgenf #68

Closed fabian-s closed 4 years ago

fabian-s commented 4 years ago

I have some data where the CDF of the Generalized F has a weird jump, leading to funny looking hazard rate estimates:

library(flexsurv)
#> Loading required package: survival

mu <- 7.495875 
sigma <- 0.35362
Q <- 0.4572124
P <- 16.68415

layout(t(1:3))
curve(hgenf(x = x, mu, sigma, Q, P), from = 0, to = 2000)
curve(dgenf(x = x, mu, sigma, Q, P), from = 0, to = 2000)
curve(pgenf(q = x, mu, sigma, Q, P), from = 0, to = 2000)

grid <- seq(0, 500, by = 20)
pgenf_grid <- 
  sapply(grid, function(t) {
    integrate(function(x) {
      dgenf(x, mu, sigma, Q, P)
      }, lower = 1e-12, upper = t)$value
  })
lines(grid, pgenf_grid, col = 2)

Created on 2019-11-04 by the reprex package (v0.3.0)

EDIT:

following up on this: https://github.com/chjackson/flexsurv-dev/blob/223ec2c24586a20455f2aa1f90a9bbae67350bbd/src/genf.cpp#L99

I first thought base's Beta CDF was to blame:

tmp <- Q * Q + 2*P
delta <- sqrt(tmp)
s1 <- 2 / (tmp + Q*delta)
s2 <- 2 / (tmp - Q*delta)
expw <- grid^(delta/sigma) * exp(-mu*delta/sigma)
cbind(grid, pbeta(s2/(s2 + s1*expw), s2, s1))
# [1,]    0 1.0000000
# [2,]   20 1.0000000
# [3,]   40 1.0000000
# [4,]   60 1.0000000
# [5,]   80 1.0000000
# [6,]  100 1.0000000
# [7,]  120 1.0000000
# [8,]  140 1.0000000
# [9,]  160 1.0000000
# [10,]  180 1.0000000    #!!
# [11,]  200 0.9258521    #!!
# [12,]  220 0.9199545
# [13,]  240 0.9131591
# [14,]  260 0.9066346
# [15,]  280 0.9001463
# [16,]  300 0.8937271
# [17,]  320 0.8873412
# [18,]  340 0.8809894
# [19,]  360 0.8746735
# [20,]  380 0.8683911
# [21,]  400 0.8621400
# [22,]  420 0.8559188
# [23,]  440 0.8497257
# [24,]  460 0.8435594
# [25,]  480 0.8374187
# [26,]  500 0.8313023

but it turns out that:

s2/(s2 + s1*expw) - 1 
# [1]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
# [7]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00 -2.220446e-16 -8.881784e-16
# [13] -3.885781e-15 -1.443290e-14 -4.873879e-14 -1.506573e-13 -4.335421e-13 -1.170841e-12
# [19] -2.987277e-12 -7.245093e-12 -1.679124e-11 -3.735090e-11 -8.005041e-11 -1.658471e-10
# [25] -3.331089e-10 -6.502774e-10
## because:
expw
# [1] 0.000000e+00 9.428105e-33 8.077073e-28 6.205466e-25 6.919643e-23 2.679786e-21 5.316233e-20
# [8] 6.647035e-19 5.928070e-18 4.084364e-17 2.295777e-16 1.094489e-15 4.554426e-15 1.690687e-14
# [15] 5.694526e-14 1.763803e-13 5.078588e-13 1.371457e-12 3.499081e-12 8.486471e-12 1.966796e-11
# [22] 4.374999e-11 9.376502e-11 1.942607e-10 3.901785e-10 7.616858e-10

so (maybe) rescaling the time axis to make sure that this kind of underflow does not happen would help....?

this is flexsurv 1.1.1 running under R version 3.6.1 (2019-07-05), Linux Mint 19.1.

chjackson commented 4 years ago

Thanks for the report! I've just discovered that we can rewrite s2/(s2 + s1*expw) - 1 as -(s1*expw) / (s2 + s1*expw), which doesn't underflow. Though I'm not sure how that gives us a clue how to rewrite the pbeta statement in a way that doesn't underflow - any idea?

fabian-s commented 4 years ago

Aah, nice!

Not sure about the proper terminology, but "flipping the axis" if the values are too close to 1 seems to work:

too_close_to_1 <- (s1*expw) / (s2 + s1*expw) < 1e-16

cbind(grid, pbeta(s2/(s2 + s1*expw), s2, s1), 
      ifelse(too_close_to_1,
             #evaluate "flipped" Beta-Dist close to 0
             pbeta((s1*expw) / (s2 + s1*expw), s1, s2, lower.tail = FALSE), 
             pbeta(s2/(s2 + s1*expw), s2, s1)))
     grid                    
 [1,]    0 1.0000000 1.0000000
 [2,]   20 1.0000000 0.9908285
 [3,]   40 1.0000000 0.9828299
 [4,]   60 1.0000000 0.9752214
 [5,]   80 1.0000000 0.9678556
 [6,]  100 1.0000000 0.9606653
 [7,]  120 1.0000000 0.9536117
 [8,]  140 1.0000000 0.9466698
 [9,]  160 1.0000000 0.9398222
[10,]  180 1.0000000 0.9330560
[11,]  200 0.9258521 0.9258521
[12,]  220 0.9199545 0.9199545
[13,]  240 0.9131591 0.9131591
[14,]  260 0.9066346 0.9066346
[15,]  280 0.9001463 0.9001463
[16,]  300 0.8937271 0.8937271
[17,]  320 0.8873412 0.8873412
[18,]  340 0.8809894 0.8809894
[19,]  360 0.8746735 0.8746735
[20,]  380 0.8683911 0.8683911
[21,]  400 0.8621400 0.8621400
[22,]  420 0.8559188 0.8559188
[23,]  440 0.8497257 0.8497257
[24,]  460 0.8435594 0.8435594
[25,]  480 0.8374187 0.8374187
[26,]  500 0.8313023 0.8313023

... ?

chjackson commented 4 years ago

That looks good, though I'm wondering why not used the "flipped" version all the time? Will there ever be situations where it has its own underflow problems? Though perhaps it makes for clearer code to do as you suggest, and have the standard version as a default, and the flipped version as an exception.

fabian-s commented 4 years ago

¯_(ツ)_/¯

not sure.

chjackson commented 4 years ago

Turned out that the flipped version also underflowed when s2/(s2 + s1*expw) was very low. So I did as you suggested and switched between the two versions.