JuliaStats / Statistics.jl

The Statistics stdlib that ships with Julia.
https://juliastats.org/Statistics.jl/dev/
Other
71 stars 40 forks source link

Inference failure on `mean` with a subarray #160

Open pdeffebach opened 9 months ago

pdeffebach commented 9 months ago

I am working on a PR to Missings.jl where I construct a subarray manually. See the PR here

I've been trying to debug inference issues using the mean function. I thought that my implementation was wrong. But now I see that mean is actually type-unstable in this instance!

Is there anything that I should do to recover type stability? Is that I'm doing kosher at all?

julia> x = [1, 2, missing]
3-element Vector{Union{Missing, Int64}}:
 1
 2
  missing

julia> function nomissing_subarray(a::AbstractVector, nonmissinginds::AbstractVector)
           T = nonmissingtype(eltype(a)) # Element type
           N = 1 # Dimension of view
           P = typeof(a) # Type of parent array
           I = Tuple{typeof(nonmissinginds)} # Type of the non-missing indices
           L = Base.IndexStyle(a) === IndexLinear # If the type supports fast linear indexing
           SubArray{T, N, P, I, L}(a, (nonmissinginds,), 0, 1)
       end;

julia> t = nomissing_subarray(x, [1, 2])
2-element view(::Vector{Union{Missing, Int64}}, [1, 2]) with eltype Int64:
 1
 2

julia> using Statistics

julia> @code_warntype mean(t)
MethodInstance for Statistics.mean(::SubArray{Int64, 1, Vector{Union{Missing, Int64}}, Tuple{Vector{Int64}}, false})
  from mean(A::AbstractArray; dims) @ Statistics ~/.julia/juliaup/julia-1.9.0+0.x64.linux.gnu/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:164
Arguments
  #self#::Core.Const(Statistics.mean)
  A::SubArray{Int64, 1, Vector{Union{Missing, Int64}}, Tuple{Vector{Int64}}, false}
Body::Any
1 ─ %1 = Statistics.:(var"#mean#2")(Statistics.:(:), #self#, A)::Any
└──      return %1

julia> @which mean(t)
mean(A::AbstractArray; dims)
     @ Statistics ~/.julia/juliaup/julia-1.9.0+0.x64.linux.gnu/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:164
pdeffebach commented 9 months ago

Update, the inference failure happens because x is a vector which may contain missing. A simpler MWE:

julia> x = Union{Int, Missing}[1, 2, 3]
3-element Vector{Union{Missing, Int64}}:
 1
 2
 3

julia> @code_warntype mean(x)
MethodInstance for Statistics.mean(::Vector{Union{Missing, Int64}})
  from mean(A::AbstractArray; dims) @ Statistics ~/.julia/juliaup/julia-1.9.0+0.x64.linux.gnu/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:164
Arguments
  #self#::Core.Const(Statistics.mean)
  A::Vector{Union{Missing, Int64}}
Body::Any
1 ─ %1 = Statistics.:(var"#mean#2")(Statistics.:(:), #self#, A)::Any
└──      return %1

I'm bummed

  1. that the example above doesn't give Union{Float64, Missing}. That would help inference a lot
  2. That my SubArray trick doesn't work. I thought telling the compiler the eltype of the SubArray was Int would be enough for it realize there is no type instability. But that is not the case.
nalimilan commented 8 months ago

The SubArray issue seems to be due to the fact that nothing ensures that a value of type T is returned: getindex just calls getindex on the parent array, without a type assertion. I wonder whether it would be acceptable to add ::T in Base, but I'm afraid it would have a performance impact in the most common case where the eltype is the same as the parent.

mean being inferred as Any more generally is annoying and probably related to the promotion mechanism that we now use. This needs investigation to see whether we can find a good solution.

Alexander-Barth commented 8 months ago

This also affects issue also affects the function var and std.

julia> @code_warntype std([1.,2.,missing])
MethodInstance for Statistics.std(::Vector{Union{Missing, Float64}})
  from std(A::AbstractArray; corrected, mean, dims) @ Statistics /mnt/data1/abarth/opt/julia-1.9.0/share/julia/stdlib/v1.9/Statistics/src/Statistics.jl:449
Arguments
  #self#::Core.Const(Statistics.std)
  A::Vector{Union{Missing, Float64}}
Body::Any
1 ─ %1 = Statistics.:(var"#std#17")(true, Statistics.nothing, Statistics.:(:), #self#, A)::Any
└──      return %1