JuliaStats / Statistics.jl

The Statistics stdlib that ships with Julia.
https://juliastats.org/Statistics.jl/dev/
Other
71 stars 40 forks source link

Fix overflows in `quantile` #145

Closed nalimilan closed 1 year ago

nalimilan commented 1 year ago

The a + γ*(b-a) introduced by https://github.com/JuliaLang/julia/pull/16572 has the advantage that it increases with γ even when a and b are very close, but it has the drawback that it is not robust to overflow. This is likely to happen in practice with small integer and floating point types.

Conversely, the (1-γ)*a + γ*b which is currently used only for non-finite quantities is robust to overflow but may not always increase with γ as when a and b are very close or (more frequently) equal since precision loss can give a slightly smaller value for a larger γ. This can be problematic as it breaks an expected invariant.

So keep using the a + γ*(b-a) formula when a ≈ b, in which case it's almost like returning either a or b but less arbitrary.

Fixes https://github.com/JuliaStats/Statistics.jl/issues/144.

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01% :tada:

Comparison is base (bb7063d) 96.98% compared to head (000d4c1) 96.99%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #145 +/- ## ========================================== + Coverage 96.98% 96.99% +0.01% ========================================== Files 1 1 Lines 431 433 +2 ========================================== + Hits 418 420 +2 Misses 13 13 ``` | [Files Changed](https://app.codecov.io/gh/JuliaStats/Statistics.jl/pull/145?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats) | Coverage Δ | | |---|---|---| | [src/Statistics.jl](https://app.codecov.io/gh/JuliaStats/Statistics.jl/pull/145?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaStats#diff-c3JjL1N0YXRpc3RpY3Muamw=) | `96.99% <100.00%> (+0.01%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

nalimilan commented 1 year ago

It required also testing if the function is non-decreasing if we increase b and switch the formula, but I tested it and it holds.

It's already covered by the test added a long time ago by https://github.com/JuliaLang/julia/pull/16572. That's how I realized the problem. ;-)

EDIT: You mean γ, not b?

bkamins commented 1 year ago

In general I mean that it should be monotonic in a, b and γ and I checked all. The tests did not cover all cases fully. The reason is that e.g. you need to test case when float(a) ≈ float(b) but !(float(a) ≈ nextfloat(float(b))) (and similarly !(prevfloat(float(a)) ≈ float(b)) for monotonicity for various γ; I think it is not covered but I checked it).

nalimilan commented 1 year ago

OK. So you mean two tests like this are needed?

    @test issorted(quantile([1.0, 1.0+eps(), 1.0+2eps(), 1.0+3eps()], range(0, 1, length=100)))
    @test issorted(quantile([1.0, 1.0+2eps(), 1.0+4eps(), 1.0+6eps()], range(0, 1, length=100)))
bkamins commented 1 year ago

Yes - something like this (this is not strictly needed 😄, but I run such tests and they were OK).