JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.73k stars 5.49k forks source link

Dispatch performance regression #55807

Closed charleskawczynski closed 1 month ago

charleskawczynski commented 1 month ago

I think I found a regression with compiling methods with increasing type information (cc @gbaraldi). Here is a reproducer:

Base.@kwdef struct Nested{A,B}
    num::Int = 1
end
nest_val(na, nb, ::Val{1}) = Nested{na, nb}()
nest_val(na, nb, ::Val{n}) where {n} = nest_val(Nested{na, nb}, Nested{na, nb}, Val(n-1))
nest_val(na, nb, n::Int) = nest_val(na, nb, Val(n))
nest_val(n) = nest_val(1, 1, n)
foo(t::Nested) = 1
for i in 1:4:25
    let i=i
        NV = nest_val(i)
        @time begin
            foo(NV)
        end
    end
end
Julia 1.8 🚀
  0.000002 seconds
  0.000057 seconds (355 allocations: 24.688 KiB, 76.66% compilation time)
  0.000055 seconds (355 allocations: 24.688 KiB, 77.15% compilation time)
  0.000060 seconds (355 allocations: 24.688 KiB, 76.81% compilation time)
  0.000061 seconds (355 allocations: 24.688 KiB, 76.18% compilation time)
  0.000121 seconds (355 allocations: 24.688 KiB, 86.99% compilation time)
  0.000063 seconds (355 allocations: 24.688 KiB, 74.36% compilation time)
Julia 1.9 🐢
  0.000002 seconds
  0.000087 seconds (323 allocations: 23.266 KiB, 86.27% compilation time)
  0.000102 seconds (323 allocations: 23.266 KiB, 83.09% compilation time)
  0.000180 seconds (323 allocations: 23.266 KiB, 41.63% compilation time)
  0.001830 seconds (323 allocations: 23.266 KiB, 10.49% compilation time)
  0.024255 seconds (323 allocations: 23.266 KiB, 0.37% compilation time)
  0.391652 seconds (323 allocations: 23.266 KiB, 0.02% compilation time)
Julia 1.10 🐢
  0.000002 seconds
  0.000106 seconds (312 allocations: 23.031 KiB, 71.90% compilation time)
  0.000112 seconds (312 allocations: 23.031 KiB, 80.39% compilation time)
  0.000233 seconds (312 allocations: 23.031 KiB, 57.58% compilation time)
  0.001319 seconds (312 allocations: 23.031 KiB, 4.36% compilation time)
  0.019645 seconds (313 allocations: 23.125 KiB, 0.24% compilation time)
  0.318507 seconds (312 allocations: 23.031 KiB, 0.03% compilation time)
Julia 1.11.0-rc3 🐢
  0.000004 seconds
  0.000043 seconds (105 allocations: 4.828 KiB, 56.51% compilation time)
  0.000043 seconds (105 allocations: 4.828 KiB, 50.47% compilation time)
  0.000125 seconds (105 allocations: 4.828 KiB, 18.22% compilation time)
  0.001327 seconds (105 allocations: 4.828 KiB, 1.66% compilation time)
  0.022224 seconds (106 allocations: 4.922 KiB, 0.12% compilation time)
  0.343260 seconds (105 allocations: 4.828 KiB, 0.01% compilation time)
gbaraldi commented 1 month ago

Profiling this shows that we spend all the time in may_contain_union_decision. I wonder if because nest_val makes so many types a query here gets more expensive? We spend it in.https://github.com/JuliaLang/julia/blob/53d3ca9855db0308ddf2044a3a0f21f3de492cf3/src/subtype.c#L1530

image
charleskawczynski commented 1 month ago

I don't know if it's helpful, but maybe it's useful to print both timings:

Base.@kwdef struct Nested{A,B}
    num::Int = 1
end
nest_val(na, nb, ::Val{1}) = Nested{na, nb}()
nest_val(na, nb, ::Val{n}) where {n} = nest_val(Nested{na, nb}, Nested{na, nb}, Val(n-1))
nest_val(na, nb, n::Int) = nest_val(na, nb, Val(n))
nest_val(n) = nest_val(1, 1, n)
foo(t::Nested) = 1
for i in 1:4:25
    let i=i
        local NV
        ts = @elapsed begin
            NV = nest_val(i)
        end
        tc = @elapsed begin
            foo(NV)
        end
        println("make struct, compile foo ($ts, $tc)")
    end
end

Which gives:

Julia 1.8
make struct, compile foo (1.13e-5, 1.1e-6)
make struct, compile foo (0.0145189, 6.02e-5)
make struct, compile foo (0.0176925, 0.0001021)
make struct, compile foo (0.0184645, 6.44e-5)
make struct, compile foo (0.0189012, 7.05e-5)
make struct, compile foo (0.012758, 8.29e-5)
make struct, compile foo (0.0191793, 7.33e-5)
Julia 1.9
make struct, compile foo (1.39e-5, 1.2e-6)
make struct, compile foo (0.0151348, 9.43e-5)
make struct, compile foo (0.0153842, 9.62e-5)
make struct, compile foo (0.0110747, 0.0001918)
make struct, compile foo (0.0246075, 0.0018085)
make struct, compile foo (0.1408822, 0.0237191)
make struct, compile foo (1.9444539, 0.3840136)
Julia 1.10
make struct, compile foo (6.9e-6, 1.3e-6)
make struct, compile foo (0.0257007, 7.14e-5)
make struct, compile foo (0.0356919, 6.56e-5)
make struct, compile foo (0.0343293, 0.0001467)
make struct, compile foo (0.0535389, 0.0014321)
make struct, compile foo (0.3459823, 0.0193436)
make struct, compile foo (5.0870128, 0.3232764)
Julia 1.11.0-rc3
make struct, compile foo (7.1e-6, 2.4e-6)
make struct, compile foo (0.02574, 4.6e-5)
make struct, compile foo (0.0330718, 4.37e-5)
make struct, compile foo (0.0341266, 0.0001204)
make struct, compile foo (0.0559988, 0.0014925)
make struct, compile foo (0.3904373, 0.0213976)
make struct, compile foo (5.7038262, 0.3391496)
charleskawczynski commented 1 month ago

FWIW, the regression still exists but is not nearly as severe when Nested is a singleton (Base.@kwdef struct Nested{A,B} end). (Suggested by @dennisYatunin)

charleskawczynski commented 1 month ago

I'm also seeing different impacts on windows vs macos, but the regression is on both.

charleskawczynski commented 1 month ago

I know the SciML ecosystem has some pretty heavily typed code, so @ChrisRackauckas may also be interested in this issue.

ChrisRackauckas commented 1 month ago

I really think we need to split the ideas of "please specialize" and "allow dispatching"

nsajko commented 1 month ago

@ChrisRackauckas that's #11339

oscardssmith commented 1 month ago

@topolarity or @vtjnash can one of you look into this?