Closed EliasRas closed 6 months ago
Once the values of the regularized incomplete gamma functions (gammainc
and gammaincc
used in the CDFs of gamma distribution and inverse gamma distribution, respectively) get close to 1, the step between neighbouring values of float64
starts to matter. When calculating the differences between the CDF values of gamma or inverse gamma distribution, the differences will eventually get smaller than float64
's resolution. This leads to zero differences and hence log densities that are negative infinity.
The issue with values close to 1 can be fixed by using the equality gammainc(a,b) = 1 - gammaincc(a,b)
and calculating e.g. gammaincc(alpha,beta * (x - 2)) - gammaincc(alpha,beta * (x + 3))
instead of gammainc(alpha,beta * (x + 3)) - gammaincc(alpha,beta * (x - 2))
. Since the values of gammaincc
will be close to 0, the step between neighbouring values of float64
will be significantly smaller. For example the closest value to 1 $\approx 1-10^{-16}$ and the closest value to 0 $\approx 10^{-324}$. These can be determined using math.nextafter
.
This motivates an implementation which uses gammainc
when its values are "far" from 1 and gammaincc
when its values are "far" from 1. The actual point where the used function should change depends on the parameters of (inverse) gamma distribution but the accuracy is good with both functions when x
is close to the mean of the distribution. Hence the mean is a good heuristic for changing the used function.
The implementations used after b7e84c5 should be numerically stable and behave (almost) identically to pymc
's implementation. This, however, comes with a performance cost.
Built in pymc.Gamma.logcdf
starts to fail when alpha=3
, beta=3/28
and value>404
and does not provide accurate results at all after value>420
.
Since the numerical inaccuracy starts at extreme value
s, the stabler (and slower) implementation is not always needed. 7719b23 implemented a rough check for whether the stability is needed or not.
The lowest value
which necessitate a stable implementation can be seen below for various gamma distribution parameters.
With inverse gamma distribution the situation is similar, although there might be issues also with small values due to the CDF being a function of 1/value
.
The custom likelihoods (and floored distributions)
kyykkaanalysis.model.modeling
might be unstable with extreme parameter values. The calculation of_podium_invgamma_logp
is already fixed to some extend but it might be useful to check if the guess that mean of the distribution is a good choice for the switch statement is a good one.