(Sidebar: The PDF function will actually accept non integer values, returning -Inf if they're not within 1eps of an integer or the value of the corresponding integer if they are.)
As part of an analysis at the day job, I found a pathological case where we currently get an InexactError:
julia> using BenchmarkTools
julia> using Distributions
# values from a fitted model
julia> y, μ, wt, ϕ = 0.6376811594202898, 0.8492925285671102, 69.0, NaN # phi isn't used here
(0.6376811594202898, 0.8492925285671102, 69.0, NaN)
julia> # definition from GLM.jl
loglik_obs(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), Int(y*wt))
loglik_obs (generic function with 1 method)
julia> loglik_obs(Binomial(), y, μ, wt, ϕ) # error
ERROR: InexactError: Int64(43.99999999999999)
Stacktrace:
[1] Int64
@ ./float.jl:788 [inlined]
[2] loglik_obs(#unused#::Binomial{Float64}, y::Float64, μ::Float64, wt::Float64, ϕ::Float64)
@ Main ./REPL[76]:2
[3] top-level scope
@ REPL[77]:1
If we look at the numbers creating the problem, we have
julia> y * wt
43.99999999999999
julia> nextfloat(y * wt)
44.0
julia> 44 / wt == y
true
julia> 44 / y == wt
true
So this really is just a floating point issue. We can fix this by defining a safer_int for "close enough"
function safer_int(x::T) where {T<:Base.IEEEFloat}
r = round(Int, x)
abs(x - r) <= eps(x) && return r
throw(InexactError(:safer_int, T, x))
end
loglik_obs_safer(::Binomial, y, μ, wt, ϕ) = logpdf(Binomial(Int(wt), μ), safer_int(y*wt))
and then everything works:
julia> loglik_obs_safer(Binomial(), y, μ, wt, ϕ) # works
-11.628163156011077
What happens to performance though for a non pathological case? Let's take a look
Currently,
loglik_obs(::Binomial)
forces the second argument tologpdf(::Binomial, x)
to be be an integer:https://github.com/JuliaStats/GLM.jl/blob/14597370eac4da245216455a91f5c0c7ccfae4a2/src/glmtools.jl#L515
(Sidebar: The PDF function will actually accept non integer values, returning -Inf if they're not within 1eps of an integer or the value of the corresponding integer if they are.)
As part of an analysis at the day job, I found a pathological case where we currently get an
InexactError
:If we look at the numbers creating the problem, we have
So this really is just a floating point issue. We can fix this by defining a
safer_int
for "close enough"and then everything works:
What happens to performance though for a non pathological case? Let's take a look
So only about 2-3ns worse.
Tested on Apple silicon:
@nalimilan if you have no strong objections, I would just go ahead and open a PR to add this.
Hat-tip @ararslan who rubber-ducked this with me and @haberdashPI who found the pathological case.