The custom likelihoods might be numerically unstable

EliasRas commented 6 months ago

The custom likelihoods (and floored distributions) kyykkaanalysis.model.modeling might be unstable with extreme parameter values. The calculation of _podium_invgamma_logp is already fixed to some extend but it might be useful to check if the guess that mean of the distribution is a good choice for the switch statement is a good one.

EliasRas commented 6 months ago

Once the values of the regularized incomplete gamma functions (gammainc and gammaincc used in the CDFs of gamma distribution and inverse gamma distribution, respectively) get close to 1, the step between neighbouring values of float64 starts to matter. When calculating the differences between the CDF values of gamma or inverse gamma distribution, the differences will eventually get smaller than float64's resolution. This leads to zero differences and hence log densities that are negative infinity.

The issue with values close to 1 can be fixed by using the equality gammainc(a,b) = 1 - gammaincc(a,b) and calculating e.g. gammaincc(alpha,beta * (x - 2)) - gammaincc(alpha,beta * (x + 3)) instead of gammainc(alpha,beta * (x + 3)) - gammaincc(alpha,beta * (x - 2)). Since the values of gammaincc will be close to 0, the step between neighbouring values of float64 will be significantly smaller. For example the closest value to 1 $\approx 1-10^{-16}$ and the closest value to 0 $\approx 10^{-324}$. These can be determined using math.nextafter.

This motivates an implementation which uses gammainc when its values are "far" from 1 and gammaincc when its values are "far" from 1. The actual point where the used function should change depends on the parameters of (inverse) gamma distribution but the accuracy is good with both functions when x is close to the mean of the distribution. Hence the mean is a good heuristic for changing the used function.

The implementations used after b7e84c5 should be numerically stable and behave (almost) identically to pymc's implementation. This, however, comes with a performance cost.

EliasRas commented 6 months ago

Built in pymc.Gamma.logcdf starts to fail when alpha=3, beta=3/28 and value>404 and does not provide accurate results at all after value>420. newplot

EliasRas commented 6 months ago

Since the numerical inaccuracy starts at extreme values, the stabler (and slower) implementation is not always needed. 7719b23 implemented a rough check for whether the stability is needed or not.

The lowest value which necessitate a stable implementation can be seen below for various gamma distribution parameters. newplot

EliasRas commented 6 months ago

With inverse gamma distribution the situation is similar, although there might be issues also with small values due to the CDF being a function of 1/value.

newplot(1)

EliasRas / KyykkaAnalysis

The custom likelihoods might be numerically unstable #2