JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.11k stars 416 forks source link

evaluating the pdf of the Skellam distribution results in an overflow error #808

Open kehlert opened 5 years ago

kehlert commented 5 years ago

This code produces an error

d = Skellam(500, 600); pdf(d, 10)

ERROR: AmosException with id 2: overflow. Stacktrace: [1] _besseli(::Float64, ::Complex{Float64}, ::Int32) at /Users/kehlert/.julia/packages/SpecialFunctions/fvheQ/src/bessel.jl:205 [2] besseli(::Float64, ::Complex{Float64}) at /Users/kehlert/.julia/packages/SpecialFunctions/fvheQ/src/bessel.jl:310 [3] besseli at /Users/kehlert/.julia/packages/SpecialFunctions/fvheQ/src/bessel.jl:396 [inlined] [4] logpdf(::Skellam{Float64}, ::Int64) at /Users/kehlert/.julia/packages/Distributions/WHjOk/src/univariate/discrete/skellam.jl:68 [5] pdf(::Skellam{Float64}, ::Int64) at /Users/kehlert/.julia/packages/Distributions/WHjOk/src/univariate/discrete/skellam.jl:71 [6] top-level scope at none:0

I think what's going on is that the Bessel function is evaluated first, and it gives enormous values. The besselix function in SpecialFunctions.jl is more stable and almost does what we need for the Skellam distribution. In fact, besselix(k, 2*lambda) gives the correct probability for a Skellam(lambda, lambda) distribution, and it works for large values of lambda. But it doesn't work for when the parameters are unequal. The code would need to be modified.

I am very new to Julia, so I don't have a simple fix ready to go, but I thought I'd bring up the issue.

matbesancon commented 5 years ago

@Sumegh-git if you feel like tackling this one too

matbesancon commented 5 years ago

this feels like a SpecialFunctions problem? cc @simonbyrne @Sumegh-git

simonbyrne commented 5 years ago

It sounds like it should be possible to do it here using the SpecialFunctions.besselix functions?

matbesancon commented 5 years ago

The bessel functions from SF is what seems to be used here

kehlert commented 5 years ago

I took another look at the problem and I have a solution (or at least a starting point for one). Here is some code that uses the Distribution package and compares it to the answer I get using the besselix function.

using Distributions
using SpecialFunctions

k = 10
mu1 = 10
mu2 = 20

d = Skellam(mu1, mu2)
dist_ans = pdf(d, k)

working_ans = besselix(k, 2*sqrt(mu1*mu2)) * exp(-(sqrt(mu2) - sqrt(mu1))^2 + k/2 * log(mu1/mu2))

println("dist_ans: $dist_ans")
println("working_ans ans: $working_ans")

I'm more probabilist than numerical analyst, but the working_ans code I provided is definitely more stable and seems reasonable to me. I tried extremely large values of mu1 and mu2 (values for which the Distribution package has an error) and the above code for working_ans was fine. I think all of the multiplications, subtractions, etc. should have operands of around the same order of magnitude. I believe the issue with the distribution package is that it was computing a huge value for the bessel function, and then multiplying it by a tiny value to get the probability back into the range of [0,1]. In contrast, the above modification does not do that. Things are roughly the same order.

Here are values for which there is an error:

k = -100
mu1 = 500
mu2 = 600

Note that -100 is actually the expected value of the random variable, and has a fairly high probability of occurring (around 0.012), so this isn't an unreasonable set of values.

Another thought: the working_ans code could definitely be improved. We could still be taking the exponential of a really large (positive or negative) number, which isn't ideal, but it's better than the current state of things.