JuliaStats / StatsFuns.jl

Mathematical functions related to statistics.
Other
235 stars 40 forks source link

Ensure that `binomlogpdf` returns non-positive values, t distributions with infinite parameter are supported, and add integration tests #126

Closed devmotion closed 3 years ago

devmotion commented 3 years ago

It seems that https://github.com/JuliaStats/StatsFuns.jl/pull/125 broke some tests in Distributions (e.g. https://github.com/JuliaStats/Distributions.jl/pull/1387/checks?check_run_id=3732360108) since binomlogpdf(5, 0.0, 0) and binomlogpdf(5, 1.0, 5) return values that are slightly above zero (around 4e-16). The PR sets an upper bound for these values and adds integration tests with Distributions.jl to ensure that we avoid such breakages in the future.

Edit: Additionally, the PR fixes the (log)pdf of a t distribution with infinite parameter which Rmath handles and Distributions tests for.

codecov-commenter commented 3 years ago

Codecov Report

Merging #126 (7f08d4f) into master (e3fe89d) will increase coverage by 0.14%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #126      +/-   ##
==========================================
+ Coverage   37.41%   37.55%   +0.14%     
==========================================
  Files          12       12              
  Lines         417      418       +1     
==========================================
+ Hits          156      157       +1     
  Misses        261      261              
Impacted Files Coverage Δ
src/distrs/binom.jl 100.00% <100.00%> (ø)
src/distrs/tdist.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e3fe89d...7f08d4f. Read the comment docs.

andreasnoack commented 3 years ago

I'm not sure why those examples would give nonzero values. Aren't the two examples

julia> Distributions.betalogpdf(1, 6, 0.0) - log(6)
0.0

julia> Distributions.betalogpdf(6, 1, 1.0) - log(6)
0.0
devmotion commented 3 years ago

This is what one gets with StatsFuns 0.9.10 in which betalogpdf uses the Rmath implementation. With the Julia implementation in StatsFuns 0.9.11 one gets the approx 4e-16 I mentioned above:


julia> Distributions.betalogpdf(1, 6, 0.0) - log(6)
4.440892098500626e-16

julia> Distributions.betalogpdf(6, 1, 1.0) - log(6)
4.440892098500626e-16

Apparently the Rmath implementation of the density of the Beta distribution handles the cases x = 0 and x = 1 explicitly and hence returns exactly log(6) for the logpdf: https://github.com/JuliaStats/Rmath-julia/blob/5c5dfd6baca358103fbb47cc03dc0ecee04fb1ff/src/dbeta.c#L71 and https://github.com/JuliaStats/Rmath-julia/blob/5c5dfd6baca358103fbb47cc03dc0ecee04fb1ff/src/dbeta.c#L76

Whereas the new Julia implementation returns -SpecialFunctions.logbeta(1, 6) and -SpecialFunctions.logbeta(6, 1) (the other terms are exactly zero) which is slightly different from log(6):

julia> -SpecialFunctions.logbeta(1, 6)
1.7917594692280554

julia> -SpecialFunctions.logbeta(6, 1)
1.7917594692280554

julia> log(6)
1.791759469228055

I guess the more general (and better?) fix would be to handle this special case (one argument equal to 1) in the implementation of SpecialFunctions.logbeta.

devmotion commented 3 years ago

I opened a PR with the more general fix of logabsbeta (and beta): https://github.com/JuliaMath/SpecialFunctions.jl/pull/349

andreasnoack commented 3 years ago

I've been back and forth on this tonight. I'm not sure it's appropriate to add the extra branches in logabsbeta since it doesn't really improve the precision for that function in isolation. Hence, I think the fix here might be better with the fix in this PR to ensure that probabilities are one at most.

Any idea why the PoissonBinomial tests are failing in Distributions. The tolerance there does look a bit tight.

devmotion commented 3 years ago

I'm not sure it's appropriate to add the extra branches in logabsbeta since it doesn't really improve the precision for that function in isolation.

To me it seems it does improve the precision. Currently, we have e.g.

julia> beta(1.0, 200.0)
0.005000000000000002

julia> logabsbeta(1.0, 4.0)
(-1.3862943611198908, 1)

julia> -log(4.0)
-1.3862943611198906

whereas with the PR to SpecialFunctions one gets

julia> beta(1.0, 200.0)
0.005

julia> logabsbeta(1.0, 4.0)
(-1.3862943611198906, 1)

julia> -log(4.0)
-1.3862943611198906

However, I think the min(0, ...) fix in this PR is useful in addition since it guarantees that the resulting (log) probabilities are in the correct domain also for other values and even if any other floating point issues might occur.

devmotion commented 3 years ago

Any idea why the PoissonBinomial tests are failing in Distributions. The tolerance there does look a bit tight.

I think it is just caused by slightly different values from the Julia implementation of binomlogpdf and/or floating point errors. The tests would pass with atol=1e-14, so I think we should just adjust the tolerances in Distributions.

devmotion commented 3 years ago

This PR to Distributions fixes the remaining test errors: https://github.com/JuliaStats/Distributions.jl/pull/1398

andreasnoack commented 3 years ago

To me it seems it does improve the precision.

My point is that it's just one ULP so less than the general precision for the function and therefore not worth the branches. Those branches then happen to then give exact zeros in a downstream function but I don't thing that is really the right evaluation metric for these kinds of functions. Here, the simpler approach is to just cap the log-probabilities. Alternatively, we could add branches in the binomial pdf for the degenerate cases.

devmotion commented 3 years ago

An additional advantage is that it is much faster for this special case but does not impact performance significantly in the other cases. With SpecialFunctions#master I get

julia> using SpecialFunctions, BenchmarkTools

julia> @btime beta(1.0, 200.0);
  47.907 ns (0 allocations: 0 bytes)

julia> @btime logabsbeta(1.0, 4.0);
  60.584 ns (0 allocations: 0 bytes)

julia> @btime beta(3.4, 200.0);
  66.080 ns (0 allocations: 0 bytes)

julia> @btime logabsbeta(3.4, 4.0);
  78.036 ns (0 allocations: 0 bytes)

whereas with the PR I get

julia> using SpecialFunctions, BenchmarkTools

julia> @btime beta(1.0, 200.0); # seems it is compiled away completely...
  0.019 ns (0 allocations: 0 bytes)

julia> @btime logabsbeta(1.0, 4.0);
  1.451 ns (0 allocations: 0 bytes)

julia> @btime beta(3.4, 200.0);
  67.098 ns (0 allocations: 0 bytes)

julia> @btime logabsbeta(3.4, 4.0);
  79.259 ns (0 allocations: 0 bytes)

The timings are consistent with


julia> @btime inv(200.0);
  0.018 ns (0 allocations: 0 bytes)

julia> @btime -log(4.0);
  1.450 ns (0 allocations: 0 bytes)

We also handle special cases in other functions, e.g., https://github.com/JuliaMath/SpecialFunctions.jl/blob/0c4181254cf664923c1504de599c5d1f2c243831/src/gamma.jl#L234-L235, hence to me it does not seem completely unusual to exploit this mathematical property and handle this case separately.