Open jishnub opened 5 months ago
Bisected to 9aa7980358349ee7017fa614525f571ffa92c55d:
9aa7980358349ee7017fa614525f571ffa92c55d is the first bad commit
commit 9aa7980358349ee7017fa614525f571ffa92c55d
Author: Jameson Nash <vtjnash@gmail.com>
Date: Fri Nov 17 13:58:01 2023 -0500
codegen: ensure i1 bool is widened to i8 before storing (#52189)
Teach value_to_pointer to convert primitive types to their stored
representation first, to avoid exposing undef bits later (via memcpy).
Take this opportunity to also generalizes the support for zext Bool to
anywhere inside any struct for changing any bitwidth to a multiple of 8
bytes. This would change a vector like <2 x i4> from occupying i8 to i16
(c.f. LLVM's LangRef), if such an operation were expressible in Julia
today. And take this opportunity to do a bit of code cleanup, now that
codegen is better and using helpers from LLVM.
Fixes #52127
src/cgutils.cpp | 3 --
src/codegen.cpp | 27 ++++--------
src/intrinsics.cpp | 119 ++++++++++++++++++++++++++++++++++++-----------------
test/llvmcall2.jl | 9 ++++
4 files changed, 98 insertions(+), 60 deletions(-)
On this commit,
julia> a = zeros(4000,4000); b = rand(size(a)...);
julia> @btime $a[1:end,1:end] .= $b;
61.351 ms (0 allocations: 0 bytes)
vs on 045b6f9c88:
julia> @btime $a[1:end,1:end] .= $b;
20.189 ms (0 allocations: 0 bytes)
This seems to have regressed on the current nightly (v"1.12.0-DEV.528"
).
On v"1.11.0-beta1"
:
julia> a = zeros(40000,4000); b = rand(size(a)...);
julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 16 samples with 1 evaluation.
Range (min … max): 311.599 ms … 332.538 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 313.798 ms ┊ GC (median): 0.00%
Time (mean ± σ): 315.770 ms ± 5.354 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▁█▁▁▁██ ▁ ▁ ▁ ▁ ▁
████████▁▁▁█▁▁▁▁▁▁▁▁▁█▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
312 ms Histogram: frequency by time 333 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
vs on nightly:
julia> @benchmark $a[1:end, 1:end] .= $b
BenchmarkTools.Trial: 12 samples with 1 evaluation.
Range (min … max): 448.373 ms … 452.969 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 450.255 ms ┊ GC (median): 0.00%
Time (mean ± σ): 450.305 ms ± 1.671 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ██ ██ █ █ █ █ █ █ █
█▁██▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█▁▁▁▁▁▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁▁▁▁█ ▁
448 ms Histogram: frequency by time 453 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> VERSION
v"1.12.0-DEV.528"
On v1.10.0
vs on v"1.11.0-DEV.1442" as well as the current master (d54a4550cb)
versioninfo:
Curiously, profiling points to integer comparison checks while iterating over
CartesianIndices
to be the most expensive step: