JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.03k stars 5.43k forks source link

Regression in broadcast assignment to a `SlowSubArray` on nightly #53158

Open jishnub opened 5 months ago

jishnub commented 5 months ago

On v1.10.0

julia> a = zeros(40000,4000); b = rand(size(a)...);

julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min … max):  293.806 ms … 296.067 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     294.500 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   294.634 ms ± 639.044 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁    ▁▁  ▁ ▁▁▁▁   ▁  ▁ ▁  █   ▁        ▁                  ▁ ▁  
  █▁▁▁▁██▁▁█▁████▁▁▁█▁▁█▁█▁▁█▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁█ ▁
  294 ms           Histogram: frequency by time          296 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

vs on v"1.11.0-DEV.1442" as well as the current master (d54a4550cb)

julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 10 samples with 1 evaluation.
 Range (min … max):  547.709 ms … 551.888 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     548.422 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   548.925 ms ±   1.418 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █      ▁▁▁▁  ▁▁                                   ▁         ▁  
  █▁▁▁▁▁▁████▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█ ▁
  548 ms           Histogram: frequency by time          552 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

versioninfo:

julia> versioninfo()
Julia Version 1.11.0-DEV.1442
Commit c16472b0014 (2024-02-01 14:59 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  LD_LIBRARY_PATH = :/usr/lib/x86_64-linux-gnu/gtk-3.0/modules
  JULIA_EDITOR = subl

Curiously, profiling points to integer comparison checks while iterating over CartesianIndices to be the most expensive step:

julia> @bprofile $a[1:end, 1:end] .= $b;

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
    ╎4638 @Base/client.jl:535; _start()
    ╎ 4638 @Base/client.jl:561; repl_main
    ╎  4638 @Base/client.jl:424; run_main_repl(interactive::Bool, quiet::Bool, banner::Symbol, history_file::Bool, color_set::Bool)
    ╎   4638 @Base/essentials.jl:1017; invokelatest
    ╎    4638 @Base/essentials.jl:1020; #invokelatest#2
    ╎     4638 @Base/client.jl:440; (::Base.var"#1100#1102"{Bool, Symbol, Bool})(REPL::Module)
    ╎    ╎ 4638 …a-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:447; run_repl(repl::REPL.AbstractREPL, consumer::Any)
    ╎    ╎  4638 …-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:461; run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::…
    ╎    ╎   4638 …-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:302; kwcall(::NamedTuple, ::typeof(REPL.start_repl_backend), backend::REPL.REPLBackend, consu…
    ╎    ╎    4638 …-master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:305; start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
    ╎    ╎     4638 …master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:320; repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
    ╎    ╎    ╎ 4638 …master/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:224; eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
  13╎    ╎    ╎  4638 @Base/boot.jl:428; eval
    ╎    ╎    ╎   4624 @BenchmarkTools/src/execution.jl:126; run(b::BenchmarkTools.Benchmark)
    ╎    ╎    ╎    4624 @BenchmarkTools/src/execution.jl:126; run
    ╎    ╎    ╎     4624 @BenchmarkTools/src/execution.jl:134; run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, n…
    ╎    ╎    ╎    ╎ 4624 @BenchmarkTools/src/execution.jl:40; run_result
    ╎    ╎    ╎    ╎  4624 @BenchmarkTools/src/execution.jl:41; #run_result#45
    ╎    ╎    ╎    ╎   4624 @Base/essentials.jl:1017; invokelatest
  24╎    ╎    ╎    ╎    4624 @Base/essentials.jl:1020; #invokelatest#2
    ╎    ╎    ╎    ╎     2    @Base/compiler/typeinfer.jl:1073; typeinf_ext_toplevel(mi::Core.MethodInstance, world::UInt64)
    ╎    ╎    ╎    ╎    ╎ 2    @Base/compiler/typeinfer.jl:1077; typeinf_ext_toplevel(interp::Core.Compiler.NativeInterpreter, mi::Core.MethodInstance)
    ╎    ╎    ╎    ╎    ╎  2    @Base/compiler/typeinfer.jl:1039; typeinf_ext(interp::Core.Compiler.NativeInterpreter, mi::Core.MethodInstance)
    ╎    ╎    ╎    ╎    ╎   2    @Base/compiler/typeinfer.jl:216; typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
    ╎    ╎    ╎    ╎    ╎    2    @Base/compiler/typeinfer.jl:246; _typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
    ╎    ╎    ╎    ╎    ╎     2    @Base/compiler/abstractinterpretation.jl:3373; typeinf_nocycle(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.Infere…
    ╎    ╎    ╎    ╎    ╎    ╎ 2    @Base/compiler/abstractinterpretation.jl:3295; typeinf_local(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.Inferen…
    ╎    ╎    ╎    ╎    ╎    ╎  2    @Base/compiler/abstractinterpretation.jl:3041; abstract_eval_basic_statement(interp::Core.Compiler.NativeInterpreter, stmt::Any, …
    ╎    ╎    ╎    ╎    ╎    ╎   2    @Base/compiler/abstractinterpretation.jl:2730; abstract_eval_statement(interp::Core.Compiler.NativeInterpreter, e::Any, vtypes::…
    ╎    ╎    ╎    ╎    ╎    ╎    2    @Base/compiler/abstractinterpretation.jl:2425; abstract_eval_statement_expr(interp::Core.Compiler.NativeInterpreter, e::Expr, v…
    ╎    ╎    ╎    ╎    ╎    ╎     2    @Base/compiler/abstractinterpretation.jl:2409; abstract_eval_call(interp::Core.Compiler.NativeInterpreter, e::Expr, vtypes::Ve…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2    @Base/compiler/abstractinterpretation.jl:2394; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::Core.Compiler.…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  2    @Base/compiler/abstractinterpretation.jl:2249; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::Core.Compiler…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   2    @Base/compiler/abstractinterpretation.jl:2256; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::Core.Compile…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    2    @Base/compiler/abstractinterpretation.jl:2174; abstract_call_known(interp::Core.Compiler.NativeInterpreter, f::Any, arginf…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     2    @Base/compiler/abstractinterpretation.jl:102; abstract_call_gf_by_type(interp::Core.Compiler.NativeInterpreter, f::Any, a…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 2    @Base/compiler/abstractinterpretation.jl:650; abstract_call_method(interp::Core.Compiler.NativeInterpreter, method::Meth…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  2    @Base/compiler/typeinfer.jl:867; typeinf_edge(interp::Core.Compiler.NativeInterpreter, method::Method, atype::Any, spar…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   2    @Base/compiler/typeinfer.jl:216; typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceState)
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/typeinfer.jl:246; _typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceStat…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    @Base/compiler/abstractinterpretation.jl:3373; typeinf_nocycle(interp::Core.Compiler.NativeInterpreter, frame::Core.…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/abstractinterpretation.jl:3295; typeinf_local(interp::Core.Compiler.NativeInterpreter, frame::Core.C…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/compiler/abstractinterpretation.jl:3041; abstract_eval_basic_statement(interp::Core.Compiler.NativeInterpret…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/compiler/abstractinterpretation.jl:2730; abstract_eval_statement(interp::Core.Compiler.NativeInterpreter, e…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    …ase/compiler/abstractinterpretation.jl:2425; abstract_eval_statement_expr(interp::Core.Compiler.NativeInterpret…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    …ase/compiler/abstractinterpretation.jl:2409; abstract_eval_call(interp::Core.Compiler.NativeInterpreter, e::Ex…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    …se/compiler/abstractinterpretation.jl:2394; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::C…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    …se/compiler/abstractinterpretation.jl:2249; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo::…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    …se/compiler/abstractinterpretation.jl:2256; abstract_call(interp::Core.Compiler.NativeInterpreter, arginfo:…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    …e/compiler/abstractinterpretation.jl:2174; abstract_call_known(interp::Core.Compiler.NativeInterpreter, f:…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    …e/compiler/abstractinterpretation.jl:111; abstract_call_gf_by_type(interp::Core.Compiler.NativeInterprete…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    …e/compiler/abstractinterpretation.jl:813; abstract_call_method_with_const_args(interp::Core.Compiler.Nat…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    …/compiler/abstractinterpretation.jl:837; abstract_call_method_with_const_args(interp::Core.Compiler.Nat…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    …/compiler/abstractinterpretation.jl:1201; semi_concrete_eval_call(interp::Core.Compiler.NativeInterpre…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/ssair/irinterp.jl:440; ir_abstract_constant_propagation(interp::Core.Compiler.NativeInt…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    @Base/compiler/ssair/irinterp.jl:280; _ir_abstract_constant_propagation(interp::Core.Compiler.NativeI…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/ssair/irinterp.jl:294; _ir_abstract_constant_propagation(interp::Core.Compiler.Native…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/compiler/ssair/irinterp.jl:248; scan!(callback::Core.Compiler.var"#559#562"{Nothing, Core.Com…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/compiler/ssair/irinterp.jl:326; (::Core.Compiler.var"#559#562"{Nothing, Core.Compiler.Native…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/ssair/irinterp.jl:141; reprocess_instruction!(interp::Core.Compiler.NativeInterpre…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    …mpiler/abstractinterpretation.jl:2428; abstract_eval_statement_expr(interp::Core.Compiler.Nativ…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/tfuncs.jl:99; instanceof_tfunc(t::Any, astag::Bool)
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/compiler/tfuncs.jl:100; instanceof_tfunc(t::Any, astag::Bool, troot::Core.Const)
   1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/compiler/typeutils.jl:115; valid_as_lattice(x::Any, astag::Bool)
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/typeinfer.jl:264; _typeinf(interp::Core.Compiler.NativeInterpreter, frame::Core.Compiler.InferenceStat…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    @Base/compiler/optimize.jl:950; optimize(interp::Core.Compiler.NativeInterpreter, opt::Core.Compiler.OptimizationSta…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/optimize.jl:976; run_passes_ipo_safe
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/compiler/optimize.jl:961; run_passes_ipo_safe(ci::Core.CodeInfo, sv::Core.Compiler.OptimizationState{Core.Co…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    @Base/compiler/ssair/passes.jl:2037; adce_pass!(ir::Core.Compiler.IRCode, inlining::Core.Compiler.InliningState{C…
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    @Base/compiler/ssair/ir.jl:1725; iterate
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     1    @Base/compiler/ssair/ir.jl:1802; iterate_compact(compact::Core.Compiler.IncrementalCompact)
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/compiler/ssair/ir.jl:276; setindex!
   1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  1    @Base/array.jl:972; setindex!
    ╎    ╎    ╎    ╎     4598 @BenchmarkTools/src/execution.jl:102; _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    ╎ 510  @BenchmarkTools/src/execution.jl:109; _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwarg…
    ╎    ╎    ╎    ╎    ╎  510  @BenchmarkTools/src/execution.jl:556; var"##sample#224"(::Tuple{Matrix{Float64}, Matrix{Float64}}, __params::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    ╎   510  @BenchmarkTools/src/execution.jl:547; var"##core#223"(a#221::Matrix{Float64}, b#222::Matrix{Float64})
    ╎    ╎    ╎    ╎    ╎    510  @Base/broadcast.jl:875; materialize!
    ╎    ╎    ╎    ╎    ╎     510  @Base/broadcast.jl:878; materialize!
    ╎    ╎    ╎    ╎    ╎    ╎ 510  @Base/broadcast.jl:920; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎  510  @Base/broadcast.jl:961; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎   510  @Base/abstractarray.jl:1061; copyto!
  60╎    ╎    ╎    ╎    ╎    ╎    60   @Base/abstractarray.jl:0; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRa…
    ╎    ╎    ╎    ╎    ╎    ╎    64   @Base/abstractarray.jl:1116; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni…
    ╎    ╎    ╎    ╎    ╎    ╎     64   @Base/abstractarray.jl:1411; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 64   @Base/abstractarray.jl:1441; _setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  64   @Base/subarray.jl:366; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   64   @Base/array.jl:979; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    64   @Base/abstractarray.jl:1345; _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     64   @Base/abstractarray.jl:2975; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 64   @Base/abstractarray.jl:2991; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  64   @Base/abstractarray.jl:3007; _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   64   @Base/abstractarray.jl:3007; _sub2ind_recurse
  64╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    64   @Base/int.jl:88; *
    ╎    ╎    ╎    ╎    ╎    ╎    386  @Base/abstractarray.jl:1120; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni…
    ╎    ╎    ╎    ╎    ╎    ╎     386  @Base/multidimensional.jl:422; iterate
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 62   @Base/multidimensional.jl:446; __inc
  62╎    ╎    ╎    ╎    ╎    ╎    ╎  62   @Base/int.jl:87; +
   1╎    ╎    ╎    ╎    ╎    ╎    ╎ 324  @Base/multidimensional.jl:447; __inc
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  323  @Base/operators.jl:276; !=
 323╎    ╎    ╎    ╎    ╎    ╎    ╎   323  @Base/promotion.jl:620; ==
    ╎    ╎    ╎    ╎    ╎ 4088 @BenchmarkTools/src/execution.jl:115; _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwarg…
    ╎    ╎    ╎    ╎    ╎  4088 @BenchmarkTools/src/execution.jl:556; var"##sample#224"(::Tuple{Matrix{Float64}, Matrix{Float64}}, __params::BenchmarkTools.Parameters)
    ╎    ╎    ╎    ╎    ╎   4088 @BenchmarkTools/src/execution.jl:547; var"##core#223"(a#221::Matrix{Float64}, b#222::Matrix{Float64})
    ╎    ╎    ╎    ╎    ╎    4088 @Base/broadcast.jl:875; materialize!
    ╎    ╎    ╎    ╎    ╎     4088 @Base/broadcast.jl:878; materialize!
    ╎    ╎    ╎    ╎    ╎    ╎ 4088 @Base/broadcast.jl:920; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎  4088 @Base/broadcast.jl:961; copyto!
    ╎    ╎    ╎    ╎    ╎    ╎   4088 @Base/abstractarray.jl:1061; copyto!
 488╎    ╎    ╎    ╎    ╎    ╎    488  @Base/abstractarray.jl:0; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{UnitRa…
    ╎    ╎    ╎    ╎    ╎    ╎    449  @Base/abstractarray.jl:1116; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni…
    ╎    ╎    ╎    ╎    ╎    ╎     448  @Base/abstractarray.jl:1411; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 448  @Base/abstractarray.jl:1441; _setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  448  @Base/subarray.jl:366; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   445  @Base/array.jl:979; setindex!
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    445  @Base/abstractarray.jl:1345; _to_linear_index
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     445  @Base/abstractarray.jl:2975; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 445  @Base/abstractarray.jl:2991; _sub2ind
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  445  @Base/abstractarray.jl:3007; _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   445  @Base/abstractarray.jl:3007; _sub2ind_recurse
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    2    @Base/abstractarray.jl:3014; offsetin
   2╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     2    @Base/int.jl:86; -
 439╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    439  @Base/int.jl:88; *
   4╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    4    @Base/int.jl:87; +
    ╎    ╎    ╎    ╎    ╎    ╎    ╎   3    @Base/subarray.jl:293; reindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎    3    @Base/array.jl:3058; getindex
    ╎    ╎    ╎    ╎    ╎    ╎    ╎     3    @Base/range.jl:932; _getindex
   3╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 3    @Base/int.jl:87; +
   1╎    ╎    ╎    ╎    ╎    ╎     1    @Base/essentials.jl:882; getindex
    ╎    ╎    ╎    ╎    ╎    ╎    3151 @Base/abstractarray.jl:1120; copyto_unaliased!(deststyle::IndexCartesian, dest::SubArray{Float64, 2, Matrix{Float64}, Tuple{Uni…
    ╎    ╎    ╎    ╎    ╎    ╎     3151 @Base/multidimensional.jl:422; iterate
    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 499  @Base/multidimensional.jl:446; __inc
 499╎    ╎    ╎    ╎    ╎    ╎    ╎  499  @Base/int.jl:87; +
   3╎    ╎    ╎    ╎    ╎    ╎    ╎ 2652 @Base/multidimensional.jl:447; __inc
    ╎    ╎    ╎    ╎    ╎    ╎    ╎  2649 @Base/operators.jl:276; !=
2649╎    ╎    ╎    ╎    ╎    ╎    ╎   2649 @Base/promotion.jl:620; ==
Total snapshots: 4645. Utilization: 100% across all threads and tasks. Use the `groupby` kwarg to break down by thread and/or task.
jishnub commented 5 months ago

Bisected to 9aa7980358349ee7017fa614525f571ffa92c55d:

9aa7980358349ee7017fa614525f571ffa92c55d is the first bad commit
commit 9aa7980358349ee7017fa614525f571ffa92c55d
Author: Jameson Nash <vtjnash@gmail.com>
Date:   Fri Nov 17 13:58:01 2023 -0500

    codegen: ensure i1 bool is widened to i8 before storing (#52189)

    Teach value_to_pointer to convert primitive types to their stored
    representation first, to avoid exposing undef bits later (via memcpy).

    Take this opportunity to also generalizes the support for zext Bool to
    anywhere inside any struct for changing any bitwidth to a multiple of 8
    bytes. This would change a vector like <2 x i4> from occupying i8 to i16
    (c.f. LLVM's LangRef), if such an operation were expressible in Julia
    today. And take this opportunity to do a bit of code cleanup, now that
    codegen is better and using helpers from LLVM.

    Fixes #52127

 src/cgutils.cpp    |   3 --
 src/codegen.cpp    |  27 ++++--------
 src/intrinsics.cpp | 119 ++++++++++++++++++++++++++++++++++++-----------------
 test/llvmcall2.jl  |   9 ++++
 4 files changed, 98 insertions(+), 60 deletions(-)

On this commit,

julia> a = zeros(4000,4000); b = rand(size(a)...);

julia> @btime $a[1:end,1:end] .= $b;
  61.351 ms (0 allocations: 0 bytes)

vs on 045b6f9c88:

julia> @btime $a[1:end,1:end] .= $b;
  20.189 ms (0 allocations: 0 bytes)
jishnub commented 2 months ago

This seems to have regressed on the current nightly (v"1.12.0-DEV.528"). On v"1.11.0-beta1":

julia> a = zeros(40000,4000); b = rand(size(a)...);

julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min … max):  311.599 ms … 332.538 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     313.798 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   315.770 ms ±   5.354 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁▁█▁▁▁██   ▁         ▁ ▁     ▁                              ▁  
  ████████▁▁▁█▁▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  312 ms           Histogram: frequency by time          333 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

vs on nightly:

julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 12 samples with 1 evaluation.
 Range (min … max):  448.373 ms … 452.969 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     450.255 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   450.305 ms ±   1.671 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ ██ ██                █  █      █    █           █   █     █  
  █▁██▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁▁█ ▁
  448 ms           Histogram: frequency by time          453 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> VERSION
v"1.12.0-DEV.528"