JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.63k stars 5.48k forks source link

Large regression with StaticArrays in v1.11 #56296

Open kaipartmann opened 2 hours ago

kaipartmann commented 2 hours ago

As also described in https://github.com/JuliaArrays/StaticArrays.jl/issues/1282, there is a large performance regression with v1.11 when using StaticArrays:

using LinearAlgebra, StaticArrays, BenchmarkTools

function mwe1(a, X, n)
    K = zeros(SMatrix{3,3})
    for i in 1:n
        k = a * i
        K += k * X * X'
    end
    return K
end

@btime mwe1(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000);
❯ julia +1.10.5 --project -t 6 mwe.jl 
  1.070 ms (0 allocations: 0 bytes)

❯ julia +1.11.1 --project -t 6 mwe.jl 
  129.890 ms (7000000 allocations: 289.92 MiB)

Interestingly, the problem can be solved by changing k * X * X' to k * (X * X'):

function mwe3(a, X, n)
    K = zeros(SMatrix{3,3})
    for i in 1:n
        k = a * i
        K += k * (X * X')
    end
    return K
end
@btime mwe3(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000);
❯ julia +1.10.5 --project -t 6 mwe.jl 
  805.458 μs (0 allocations: 0 bytes)

❯ julia +1.11.1 --project -t 6 mwe.jl
  708.208 μs (0 allocations: 0 bytes)
ronisbr commented 2 hours ago

I can reproduce those results in macOS. I found some interesting scenarios based on the proposed MWE:

If I use this version, I see all the allocations:

function mwe2(a, X, n)
    local K
    for i in 1:n
        k = a * i
        K = k * X * X'
    end
    return K
end

julia> @btime mwe2(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000)
  163.540 ms (7000000 allocations: 289.92 MiB)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
 10.0  10.0  10.0
 10.0  10.0  10.0
 10.0  10.0  10.0

However, if I suppress the local variable k, everything works:

function mwe3(a, X, n)
    local K
    for i in 1:n
        K = a * i * X * X'
    end
    return K
end

julia> @btime mwe3(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000)
  1.916 ns (0 allocations: 0 bytes)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
 10.0  10.0  10.0
 10.0  10.0  10.0
 10.0  10.0  10.0
gbaraldi commented 2 hours ago

This is an inlining change │ %31 = invoke LinearAlgebra.broadcast(LinearAlgebra.:*::typeof(*), %29::Float64, X::SVector{3, Float64}, %30::Vararg{Any})::SMatrix{3, 3, Float64, 9} no longer gets inlined and we allocate because of it. Changing the code to this

function mwe1(a, X, n)
           K = zeros(SMatrix{3,3})
           for i in 1:n
               k = a * i
               K1 = k * X
               K += @inline K1* X'
           end
           return K
       end

fixes it and its actually better