JuliaStats / Statistics.jl

The Statistics stdlib that ships with Julia.
https://juliastats.org/Statistics.jl/dev/
Other
71 stars 40 forks source link

`mean` incorrectly computes means of ranges #120

Open yurivish opened 2 years ago

yurivish commented 2 years ago

For example, it incorrectly computes the mean of the single-element range containing the number 123 as -5 if the element type is Int8:

julia> using Statistics

julia> mean(Int8(123):Int8(123))
-5.0

As another example, the mean of the range 126:127 is computed as -1.5 rather than the true mean, which is 126.5:

julia> mean(Int8(126):Int8(127))
-1.5

Because median delegates to mean, the median is also wrong:

julia> median(Int8(123):Int8(123))
-5.0

This is due to a “performance-optimized” mean implementation:

https://github.com/JuliaStats/Statistics.jl/blob/0588f2cf9e43f9f72af5802feaf0af4b652c3257/src/Statistics.jl#L185-L188

The code mishandles integer overflow, affecting all standard integer types (Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, and UInt64):

julia> mean(typemax(Int):typemax(Int))
-1.0

julia> mean(UInt8(255):UInt8(255))
127.0

Since it also mishandles floating-point overflow, this affects all standard float types (Float16, Float32, and Float64):

julia> mean(Float16(12345):Float16(54321))
Inf16

The mean is computed incorrectly for 25% of all signed integer ranges and 50% of all unsigned integer ranges.

jishnub commented 2 years ago

This should be handled by https://github.com/JuliaStats/Statistics.jl/pull/115 I think

mbauman commented 1 week ago

I think all these cases were fixed by #150.