Closed devmotion closed 3 years ago
Merging #126 (7f08d4f) into master (e3fe89d) will increase coverage by
0.14%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #126 +/- ##
==========================================
+ Coverage 37.41% 37.55% +0.14%
==========================================
Files 12 12
Lines 417 418 +1
==========================================
+ Hits 156 157 +1
Misses 261 261
Impacted Files | Coverage Δ | |
---|---|---|
src/distrs/binom.jl | 100.00% <100.00%> (ø) |
|
src/distrs/tdist.jl | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update e3fe89d...7f08d4f. Read the comment docs.
I'm not sure why those examples would give nonzero values. Aren't the two examples
julia> Distributions.betalogpdf(1, 6, 0.0) - log(6)
0.0
julia> Distributions.betalogpdf(6, 1, 1.0) - log(6)
0.0
This is what one gets with StatsFuns 0.9.10 in which betalogpdf
uses the Rmath implementation. With the Julia implementation in StatsFuns 0.9.11 one gets the approx 4e-16 I mentioned above:
julia> Distributions.betalogpdf(1, 6, 0.0) - log(6)
4.440892098500626e-16
julia> Distributions.betalogpdf(6, 1, 1.0) - log(6)
4.440892098500626e-16
Apparently the Rmath implementation of the density of the Beta distribution handles the cases x = 0
and x = 1
explicitly and hence returns exactly log(6)
for the logpdf: https://github.com/JuliaStats/Rmath-julia/blob/5c5dfd6baca358103fbb47cc03dc0ecee04fb1ff/src/dbeta.c#L71 and https://github.com/JuliaStats/Rmath-julia/blob/5c5dfd6baca358103fbb47cc03dc0ecee04fb1ff/src/dbeta.c#L76
Whereas the new Julia implementation returns -SpecialFunctions.logbeta(1, 6)
and -SpecialFunctions.logbeta(6, 1)
(the other terms are exactly zero) which is slightly different from log(6)
:
julia> -SpecialFunctions.logbeta(1, 6)
1.7917594692280554
julia> -SpecialFunctions.logbeta(6, 1)
1.7917594692280554
julia> log(6)
1.791759469228055
I guess the more general (and better?) fix would be to handle this special case (one argument equal to 1) in the implementation of SpecialFunctions.logbeta
.
I opened a PR with the more general fix of logabsbeta
(and beta
): https://github.com/JuliaMath/SpecialFunctions.jl/pull/349
I've been back and forth on this tonight. I'm not sure it's appropriate to add the extra branches in logabsbeta
since it doesn't really improve the precision for that function in isolation. Hence, I think the fix here might be better with the fix in this PR to ensure that probabilities are one at most.
Any idea why the PoissonBinomial tests are failing in Distributions. The tolerance there does look a bit tight.
I'm not sure it's appropriate to add the extra branches in logabsbeta since it doesn't really improve the precision for that function in isolation.
To me it seems it does improve the precision. Currently, we have e.g.
julia> beta(1.0, 200.0)
0.005000000000000002
julia> logabsbeta(1.0, 4.0)
(-1.3862943611198908, 1)
julia> -log(4.0)
-1.3862943611198906
whereas with the PR to SpecialFunctions one gets
julia> beta(1.0, 200.0)
0.005
julia> logabsbeta(1.0, 4.0)
(-1.3862943611198906, 1)
julia> -log(4.0)
-1.3862943611198906
However, I think the min(0, ...)
fix in this PR is useful in addition since it guarantees that the resulting (log) probabilities are in the correct domain also for other values and even if any other floating point issues might occur.
Any idea why the PoissonBinomial tests are failing in Distributions. The tolerance there does look a bit tight.
I think it is just caused by slightly different values from the Julia implementation of binomlogpdf
and/or floating point errors. The tests would pass with atol=1e-14
, so I think we should just adjust the tolerances in Distributions.
This PR to Distributions fixes the remaining test errors: https://github.com/JuliaStats/Distributions.jl/pull/1398
To me it seems it does improve the precision.
My point is that it's just one ULP so less than the general precision for the function and therefore not worth the branches. Those branches then happen to then give exact zeros in a downstream function but I don't thing that is really the right evaluation metric for these kinds of functions. Here, the simpler approach is to just cap the log-probabilities. Alternatively, we could add branches in the binomial pdf for the degenerate cases.
An additional advantage is that it is much faster for this special case but does not impact performance significantly in the other cases. With SpecialFunctions#master I get
julia> using SpecialFunctions, BenchmarkTools
julia> @btime beta(1.0, 200.0);
47.907 ns (0 allocations: 0 bytes)
julia> @btime logabsbeta(1.0, 4.0);
60.584 ns (0 allocations: 0 bytes)
julia> @btime beta(3.4, 200.0);
66.080 ns (0 allocations: 0 bytes)
julia> @btime logabsbeta(3.4, 4.0);
78.036 ns (0 allocations: 0 bytes)
whereas with the PR I get
julia> using SpecialFunctions, BenchmarkTools
julia> @btime beta(1.0, 200.0); # seems it is compiled away completely...
0.019 ns (0 allocations: 0 bytes)
julia> @btime logabsbeta(1.0, 4.0);
1.451 ns (0 allocations: 0 bytes)
julia> @btime beta(3.4, 200.0);
67.098 ns (0 allocations: 0 bytes)
julia> @btime logabsbeta(3.4, 4.0);
79.259 ns (0 allocations: 0 bytes)
The timings are consistent with
julia> @btime inv(200.0);
0.018 ns (0 allocations: 0 bytes)
julia> @btime -log(4.0);
1.450 ns (0 allocations: 0 bytes)
We also handle special cases in other functions, e.g., https://github.com/JuliaMath/SpecialFunctions.jl/blob/0c4181254cf664923c1504de599c5d1f2c243831/src/gamma.jl#L234-L235, hence to me it does not seem completely unusual to exploit this mathematical property and handle this case separately.
It seems that https://github.com/JuliaStats/StatsFuns.jl/pull/125 broke some tests in Distributions (e.g. https://github.com/JuliaStats/Distributions.jl/pull/1387/checks?check_run_id=3732360108) since
binomlogpdf(5, 0.0, 0)
andbinomlogpdf(5, 1.0, 5)
return values that are slightly above zero (around 4e-16). The PR sets an upper bound for these values and adds integration tests with Distributions.jl to ensure that we avoid such breakages in the future.Edit: Additionally, the PR fixes the (log)pdf of a t distribution with infinite parameter which Rmath handles and Distributions tests for.