Open Martin-exp-z2 opened 1 month ago
Thanks for reporting this. The issue is actually in StatsFuns
so I will move the issue.
julia> binominvcdf(5000.0, 0.998575, 0.0005)
5000.0
However, for the quantile function, we actually rely on Rmath so it is a bit surprising and might be an issue with the version of Rmath that we use.
Looked a bit into this and it appears that it is actually a bug in R that was fixed relatively recently, see https://bugs.r-project.org/show_bug.cgi?id=18711. So we should upgrade Rmath-julia
and Rmath_jll
. I'm initiating the process.
Once https://github.com/JuliaPackaging/Yggdrasil/pull/9413 has been merged, this issue should be resolved. It would be good to add a test case, though. @Martin-exp-z2 would you be able to prepare a PR with a test?
I can confirm that this now works
(StatsFuns) pkg> up
Updating registry at `~/.julia/registries/General.toml`
Installed Rmath ───── v0.8.0
Installed Rmath_jll ─ v0.5.1+0
Downloaded artifact: Rmath
Updating `~/.julia/dev/StatsFuns/Project.toml`
[79098fc4] ↑ Rmath v0.7.1 ⇒ v0.8.0
Updating `~/.julia/dev/StatsFuns/Manifest.toml`
[79098fc4] ↑ Rmath v0.7.1 ⇒ v0.8.0
[f50d1b31] ↑ Rmath_jll v0.4.3+0 ⇒ v0.5.1+0
Precompiling project...
3 dependencies successfully precompiled in 4 seconds. 12 already precompiled.
julia> using StatsFuns
julia> binominvcdf(5000.0, 0.998575, 0.0005)
4983.0
@andreasnoack , thank you very much indeed for fixing the issue! I was not aware that the computation of quantiles in Julia actually relies on R routines. In future, it might be a good idea to write all the computation in Julia itself: I presume that this would make it even more efficient. However, I am aware that this requires considerable effort, which might be better spent elsewhere.
I am ready to prepare a program which would test the correctness of the computation of quantiles of instances of the binomial distribution suitably chosen at random (both parameters as well as quantile levels). Would that be OK? However, I am quite busy in general. Should the program not be ready in a few days, please remind me.
In future, it might be a good idea to write all the computation in Julia itself
We are working towards that, see https://github.com/JuliaStats/StatsFuns.jl/issues/37, but indeed it takes time to implement these functions. We try to compose these out if functions from SpecialFunctions.jl
and HypergeometricFunctions.jl
to avoid doing too much of the heavy lifting here. For the binomial distribution, we have everything except for quantile functions (plural because of the complementary and log versions). Whenever, we don't have native implementations, we fall back to using Rmath, see e.g. https://github.com/JuliaStats/StatsFuns.jl/blob/5d0bf33f370ffcab335b46683903acd51ad08135/src/distrs/binom.jl#L4-L14
Regarding the testing then we mostly rely on testing against values from R since R is generally considered battle tested and we test a relatively small number of combinations. See e.g. https://github.com/JuliaStats/StatsFuns.jl/blob/5d0bf33f370ffcab335b46683903acd51ad08135/test/rmath.jl#L219-L231. Hence, my suggestion was actually just to append the values I linked to with the values that you used when reporting the issue. A more elaborate test setup would also be useful but it could easily take some time to complete so I think it is fine with the smaller test for now.
@andreasnoack , could you please help me update to the newest version of Rmath
? I get:
(@v1.10) pkg> st Rmath
Status `~/.julia/environments/v1.10/Project.toml`
⌅ [79098fc4] Rmath v0.7.1
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`
(@v1.10) pkg> status --outdated
Status `~/.julia/environments/v1.10/Project.toml`
⌅ [79098fc4] Rmath v0.7.1 (<v0.8.0): StatsFuns
(@v1.10) pkg> up
Updating registry at `~/.julia/registries/General.toml`
No Changes to `~/.julia/environments/v1.10/Project.toml`
No Changes to `~/.julia/environments/v1.10/Manifest.toml`
Many thanks!
I just realized that a new version of StatsFuns is needed here because of the minor version bump of Rmath. Once https://github.com/JuliaRegistries/General/pull/115053 is available, you should be able to upgrade.
@andreasnoack , great! Now it works on my computer, too! Please find a testing program below (I was unable to attach it as a file, but maybe it is meant to be this way). After testing 2M randomly selected instances of the binomial distribution, no error so far.
#!/usr/bin/env julia
#=
Tests the correctness of quantiles of randomly selected instances of the
binomial distribution.
The number of trials is chosen according to the geometric distribution.
For each selected number of trials, two success probabilities are selected:
one according the uniform distribution and the other one according to a
beta distribution concentrated near a randomly selected endpoint 0 or 1.
Martin Raič (GitHub: Martin-exp-z2), September 2024
=#
using Distributions
using StatsFuns
using Printf
### SETTINGS
# Choice of instances of the binomial family
"2 N instances are being tested."
N = 1000000
"Expected number of trials"
n_e = 5000
"""
Typically, the expected distance of the success probability to the boundary
multiplied by the number of trials.
Refers to the success probability chosen according to the beta distribution.
"""
p_e = 20
"Testing is roughly restricted to levels between tol/2 and 1 - tol/2"
tol = 1e-6
"The function which actually computes quantiles of the binomial distrubutions"
# q_binom(n::Integer, p::Real, l::Real) = quantile(Binomial(n, p), l)
q_binom = binominvcdf
# Display settings
"""
Possible values:
:DETAILED : each instance of the binomial distribution being tested is displayed
:PROGRESS : progress is displayed
"""
output_mode = :PROGRESS
"Determines how often the progress is refreshed."
progress_perc_span = 1
progress_perc_format(p::Real) = @sprintf("%.0f", p)
### MAIN FUNCTIONS
"""
Tests a quantile of Binomial(n, p) at a random level,
so chosen that the lower quantile at this level equals k.
For X ~ Binomial(n, p), there should be
cp0 = P(X < k) and cp1 = P(X <= k).
Thus, the latter two parameters are meant to be pre-computed.
The level l is chosen so that cp0 < l <= cp1.
"""
function q_binom_test(n::Integer, p::Real, k::Integer, cp0::Real, cp1::Real)
l = cp1 - rand()*(cp1 - cp0)
q = q_binom(n, p, l)
if q != k
println("""
The quantile of binomial($n, $p) at level $l does not match:
the returned value is $q, while the correct value is $k.
""")
exit(-1)
end
end
"""
Tests quantiles of Binomial(n, p).
For each value k with P(X <= k), P(X >= k) >= tol/2,
q_binom_test(n, p, k, cp0, cp1) is called with appropriate cp0 and cp1.
"""
function q_binom_test(n::Integer, p::Real)
if output_mode == :DETAILED
print("Testing quantiles of Binomial($n, $p) ... ")
end
kk = round(Int, n*p)
cp = cdf(Binomial(n, p), kk)
k = kk
cp1 = cp
while cp1 >= tol/2
cp0 = cdf(Binomial(n, p), k - 1)
q_binom_test(n, p, k, cp0, cp1)
k -= 1
cp1 = cp0
end
ccp0 = 1 - cp
k = kk + 1
while ccp0 >= tol/2
ccp1 = ccdf(Binomial(n, p), k)
q_binom_test(n, p, k, 1 - ccp0, 1 - ccp1)
k += 1
ccp0 = ccp1
end
if output_mode == :DETAILED
println("OK")
end
end
### PROGRESS DISPLAYING UTILITIES
let
global display_progress
len = 0
last_span = -1
function display_progress(it::Integer)
if output_mode == :PROGRESS
perc = 100*it/N
span = round(Int, perc/progress_perc_span)
if span > last_span
last_span = span
update = true
else
update = (it == N)
end
if update
perc_s = progress_perc_format(perc)
len = max(len, length(perc_s))
print("Progress: $(rpad(perc_s, len)) %\r")
end
end
end
end
### MAIN PART
println("Testing quantiles of the binomial distribution ...")
display_progress(0)
for it in range(1, N)
n = rand(Geometric(1/n_e)) + 1
q_binom_test(n, rand())
b = max(n/p_e - 1, 1) # Typically, n/p_e - 1 .
q_binom_test(n, rand(rand((false, true)) ? Beta(b, 1) : Beta(1, b)))
display_progress(it)
end
print("\nAll tested quantiles match!\n")
Hello everyone,
I have noticed that the call
quantile(Binomial(5000, 0.998575), 0.0005)
returns5000
, which is wrong. Consider:which return correct values (virtually the same as the corresponding calls of
scipy.stats.binom.cdf
in Python). Thus, the above-mentioned call ofquantile()
should return4983
.In addition, in my opinion, it would be more natural and useful if the function
cquantile()
returned the upper quantile. In particular, this would allow the user to get the middle median by calling(quantile(dist, 0.5) + cquantile(dist, 0.5))/2
.I am using Distributions v0.25.111 in Julia v1.10.5.
Thank you very much in advance for considering the issue!