Open BenjaminRemez opened 2 years ago
Could confirm on my PC.
BTW this version is staticly (3x) faster than func!
on 1.8/master
function func2!(v)
Random.seed!(0)
@. v = rand()
@. v = log.(v)
end
@btime func!($v) #178.200 μs (6 allocations: 464 bytes) | 1.7: 122.000 μs (6 allocations: 464 bytes)
@btime func2!($v) # 66.600 μs (6 allocations: 464 bytes) | 1.7: 70.600 μs (6 allocations: 464 bytes)
@oscardssmith
@N5N3 Yes, that's the performance difference I spotted already in this Discord post. (I can open a separate issue for that disparity as well.)
@oscardssmith is this fixed by https://github.com/JuliaLang/julia/pull/46359?
I doubt it since this code never calls exp
.
down to
julia> @btime func!(v);
57.542 μs (0 allocations: 0 bytes)
on master. regression fixed!
@adienes @vtjnash Are you sure this benchmark does not compare favorably due to your test hardware? Running on today's nightly build, I get:
julia> versioninfo()
Julia Version 1.11.0-DEV.1348
Commit a9e2bd4713 (2024-01-21 11:16 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, icelake-client)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
julia> @btime func!(v);
201.100 μs (0 allocations: 0 bytes)
which is comparable to the performance on 1.8.3. My original post is on this same hardware, and I can reproduce that timing:
julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 8 × Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, icelake-client)
Threads: 1 on 8 virtual cores
julia> @btime func!(v);
200.400 μs (0 allocations: 0 bytes)
(In fact, using @benchmark
suggests 1.8 has a consistent minimal time that is lower.)
I was just scrolling through older regressions to see if any had been fixed --- if I was too hasty and you can still reproduce the problem, it would be appropriate I think to re-open this issue.
However, in that case I would add the tag windows
(I cannot add this though) or change the title to reflect that it may be platform-dependent
I cannot add tags or re-open the issue (can change the title, though). I don't have access a non-Windows machine at the moment - could you confirm whether the original regression does not appear on another platform at all, or that it is fixed on master?
my original comment was made after testing on my hardware, which is
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 10 × Apple M1 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 on 8 virtual cores
the result over version history is:
and only tried on 1.7 vs master before posting my first comment. interestingly, not only does the original regression not occur, but there seems to be a separate one from 1.10 to master
While investigating if this performance difference changes in 1.8, I found the following regression. For example
Under 1.7.3.
Under 1.8-rc4
I should note that changing the internal function to
exp
i.e.@. v = exp(rand)
, there is a dramatic performance boost in 1.8-rc4, which appears due to the fact thatexp
is inlined in 1.8 while it was not in 1.7 - though I understand that may be a bug rather than a feature (#46323, #46359).