JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.45k stars 5.46k forks source link

1.10 regression involving a higher-order function calling `^` #53144

Open Drvi opened 7 months ago

Drvi commented 7 months ago

We've just noticed a regression with this MRE: foo(tf, args...) = sum(x->tf(args...), 1:100000000)

1.9.2:

julia> @btime foo($^, 10, 3)
  203.368 ms (0 allocations: 0 bytes)

1.10.0:

julia> @btime foo($^, 10, 3)
  281.304 ms (0 allocations: 0 bytes)

But the regression is present also at a very recent tip of the backports-release-1.10 branch. Note that foo($+, 10, 3) has the same perf on both 1.9 and 1.10

Not sure if relevant, but the inffered effects seem different between 1.9 and 1.10: 1.9

julia> Base.infer_effects(foo, (typeof(^),Int,Int))
(!c,+e,!n,!t,+s,+m,+i)
#               ^

1.10

julia> Base.infer_effects(foo, (typeof(^),Int,Int))
(!c,+e,!n,!t,+s,!m,+i)
#               ^
Seelengrab commented 7 months ago

This seems to have gotten even worse effectwise on a 2-days old master, now tainting everything:

julia> foo(tf, args...) = sum(x->tf(args...), 1:100000000)
foo (generic function with 1 method)

julia> Base.infer_effects(foo, (typeof(^),Int,Int))
(!c,!e,!n,!t,!s,!m,!u)′

julia> versioninfo()
Julia Version 1.11.0-DEV.1456
Commit d54a4550cbe (2024-02-02 07:09 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: 24 × AMD Ryzen 9 7900X 12-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 23 default, 1 interactive, 11 GC (on 24 virtual cores)
Environment:
  JULIA_PKG_USE_CLI_GIT = true

Though surprisingly the performance is better:

# master

julia> @btime foo($^, 10, 3)
  161.148 ms (0 allocations: 0 bytes)
100000000000

# 1.10.0-beta3

julia> @btime foo($^, 10, 3)
  174.397 ms (0 allocations: 0 bytes)
100000000000
dpinol commented 6 months ago

I have tested the MWE above, and I can only reproduce the regression on an AMD, but not on an intel machine.

My AMD machine

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 64 × AMD EPYC 9374F 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)

My intel machine

Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13950HX
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)

I also get a regression on this case (but only on AMD):

using BenchmarkTools
const v=rand(10_000);
gt(x)=x>0.0
@btime maximum(Iterators.filter(gt, v))

Julia 1.9.4

10.709 μs (1 allocation: 16 bytes)

julia 1.10.2

37.789 μs (1 allocation: 16 bytes)

also in my case on julia 1.11.0-alpha1 is even worse

  51.320 μs (1 allocation: 16 bytes)
aplavin commented 6 months ago

On macbook m2, the difference is even larger – about 2x regression in 1.10: from 187.465 ms to 326.619 ms.

gbaraldi commented 6 months ago

The inference difference

#1.9
pairs(::NamedTuple{(), Tuple{}})::Core.Const(Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}()) (+c,+e,+n,+t,+s,+m,+i)

#1.10
pairs(::@NamedTuple{})::Core.Const(Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}()) (+c,+e,+n,+t,+s,!m,+i)

This is inside _sum(f, a, ::Colon; kw...)