Closed fabian-s closed 4 years ago
Thanks for the report! I've just discovered that we can rewrite s2/(s2 + s1*expw) - 1
as -(s1*expw) / (s2 + s1*expw)
, which doesn't underflow. Though I'm not sure how that gives us a clue how to rewrite the pbeta
statement in a way that doesn't underflow - any idea?
Aah, nice!
Not sure about the proper terminology, but "flipping the axis" if the values are too close to 1 seems to work:
too_close_to_1 <- (s1*expw) / (s2 + s1*expw) < 1e-16
cbind(grid, pbeta(s2/(s2 + s1*expw), s2, s1),
ifelse(too_close_to_1,
#evaluate "flipped" Beta-Dist close to 0
pbeta((s1*expw) / (s2 + s1*expw), s1, s2, lower.tail = FALSE),
pbeta(s2/(s2 + s1*expw), s2, s1)))
grid
[1,] 0 1.0000000 1.0000000
[2,] 20 1.0000000 0.9908285
[3,] 40 1.0000000 0.9828299
[4,] 60 1.0000000 0.9752214
[5,] 80 1.0000000 0.9678556
[6,] 100 1.0000000 0.9606653
[7,] 120 1.0000000 0.9536117
[8,] 140 1.0000000 0.9466698
[9,] 160 1.0000000 0.9398222
[10,] 180 1.0000000 0.9330560
[11,] 200 0.9258521 0.9258521
[12,] 220 0.9199545 0.9199545
[13,] 240 0.9131591 0.9131591
[14,] 260 0.9066346 0.9066346
[15,] 280 0.9001463 0.9001463
[16,] 300 0.8937271 0.8937271
[17,] 320 0.8873412 0.8873412
[18,] 340 0.8809894 0.8809894
[19,] 360 0.8746735 0.8746735
[20,] 380 0.8683911 0.8683911
[21,] 400 0.8621400 0.8621400
[22,] 420 0.8559188 0.8559188
[23,] 440 0.8497257 0.8497257
[24,] 460 0.8435594 0.8435594
[25,] 480 0.8374187 0.8374187
[26,] 500 0.8313023 0.8313023
... ?
That looks good, though I'm wondering why not used the "flipped" version all the time? Will there ever be situations where it has its own underflow problems? Though perhaps it makes for clearer code to do as you suggest, and have the standard version as a default, and the flipped version as an exception.
¯_(ツ)_/¯
not sure.
Turned out that the flipped version also underflowed when s2/(s2 + s1*expw)
was very low. So I did as you suggested and switched between the two versions.
I have some data where the CDF of the Generalized F has a weird jump, leading to funny looking hazard rate estimates:
Created on 2019-11-04 by the reprex package (v0.3.0)
EDIT:
following up on this: https://github.com/chjackson/flexsurv-dev/blob/223ec2c24586a20455f2aa1f90a9bbae67350bbd/src/genf.cpp#L99
I first thought base's Beta CDF was to blame:
but it turns out that:
so (maybe) rescaling the time axis to make sure that this kind of underflow does not happen would help....?
this is flexsurv 1.1.1 running under R version 3.6.1 (2019-07-05), Linux Mint 19.1.