Open kaipartmann opened 2 hours ago
I can reproduce those results in macOS. I found some interesting scenarios based on the proposed MWE:
If I use this version, I see all the allocations:
function mwe2(a, X, n)
local K
for i in 1:n
k = a * i
K = k * X * X'
end
return K
end
julia> @btime mwe2(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000)
163.540 ms (7000000 allocations: 289.92 MiB)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
10.0 10.0 10.0
10.0 10.0 10.0
10.0 10.0 10.0
However, if I suppress the local variable k
, everything works:
function mwe3(a, X, n)
local K
for i in 1:n
K = a * i * X * X'
end
return K
end
julia> @btime mwe3(1e-5, SVector{3}(1.0, 1.0, 1.0), 1_000_000)
1.916 ns (0 allocations: 0 bytes)
3×3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)×SOneTo(3):
10.0 10.0 10.0
10.0 10.0 10.0
10.0 10.0 10.0
This is an inlining change
│ %31 = invoke LinearAlgebra.broadcast(LinearAlgebra.:*::typeof(*), %29::Float64, X::SVector{3, Float64}, %30::Vararg{Any})::SMatrix{3, 3, Float64, 9}
no longer gets inlined and we allocate because of it.
Changing the code to this
function mwe1(a, X, n)
K = zeros(SMatrix{3,3})
for i in 1:n
k = a * i
K1 = k * X
K += @inline K1* X'
end
return K
end
fixes it and its actually better
As also described in https://github.com/JuliaArrays/StaticArrays.jl/issues/1282, there is a large performance regression with
v1.11
when usingStaticArrays
:Interestingly, the problem can be solved by changing
k * X * X'
tok * (X * X')
: