JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.59k stars 5.48k forks source link

1.9-rc1 regression - `StaticArray` allocates `Core.SimpleVector` objects, runtime dispatch #49145

Open BioTurboNick opened 1 year ago

BioTurboNick commented 1 year ago

1.9-rc1: Time: image

Allocations: image

1.8.5: Time: image

Allocations: image

With StaticArrays v1.5.19 on each

Julia Version 1.9.0-rc1
Commit 3b2e0d8fbc1 (2023-03-07 07:51 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, icelake-server)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_PKG_USE_CLI_GIT = true
  JULIA_NUM_THREADS = auto
  JULIA_EDITOR = code
KristofferC commented 1 year ago

Probably gonna need some code as well.

BioTurboNick commented 1 year ago

Found a minimal example. It appears to be when the length of an array is passed into a type parameter.

using BenchmarkTools
using StaticArrays
const xx = [1; 2]
@noinline function e(x)
    k = length(x)
    SVector{k}(x)
end

@btime e(xx) 
# 1.9-rc1: 27.898 μs (31 allocations: 2.66 KiB)
# 1.8.5: 762.239 ns (10 allocations: 544 bytes)
N5N3 commented 1 year ago

Looks like #45062 ?

BioTurboNick commented 1 year ago

Linking my comment here: https://github.com/JuliaLang/julia/issues/48612#issuecomment-1484974814

TL;DR: If a performance regression is determined to be unavoidable for a particular release, can it be well-documented, users warned if affected pattern found, and/or could a workaround be provided; or is some hacky patch possible to improve specific situations?

An aim being to reduce surprise and effort in troubleshooting if and when users hit it, and enable decision making when it comes to upgrading from 1.8 to 1.9.

N5N3 commented 1 year ago

An easy fix on StaticArrays.jl's side is replacing

@propagate_inbounds (::Type{SA})(a::AbstractArray) where {SA <: StaticArray} = convert(SA, a)

with

@propagate_inbounds (T::Type{<:StaticArray})(a::AbstractArray) = convert(T, a)

This make sure we can skip the unneeded runtime subtyping.

KristofferC commented 1 year ago

The workaround here seems to be @noinline SVector{k}(x).

I guess the tradeoff here is whether to hit a dynamic dispatch when calling the uninferrable SVector{k} constructor, or inline it but then risk the body that is inlined to be more expensive than the dynamic dispatch?

It seems the choice made here is not the best.

vtjnash commented 8 months ago

Yeah, we may need a heuristics fix for https://github.com/JuliaLang/julia/pull/45062, since right now it is likely pretty far off in the cost model