Open gbaraldi opened 11 months ago
Did you mean to link to some piece of code here?
some piece of code
julia> f(n::UInt32) = reinterpret(NTuple{4,UInt8}, n)
f (generic function with 1 method)
julia> f(0x12345678)
(0x78, 0x56, 0x34, 0x12)
julia> Base.infer_effects(f, Tuple{UInt32})
(!c,!e,!n,!t,!s,!m,!u)′
julia> versioninfo()
Julia Version 1.11.0-DEV.856
Commit e7345b89fd4 (2023-11-07 02:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
WORD_SIZE: 64
LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
Threads: 11 on 8 virtual cores
Environment:
JULIA_NUM_PRECOMPILE_TASKS = 3
JULIA_PKG_PRECOMPILE_AUTO = 0
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8
This could just be implemented with a couple of cheap shifts, so I guess Julia should use that approach, too?
On second thought, fixing this might not be worth it, or actually it might even be preferable to leave it as is. This is because reinterpreting tuples as primitive integers (or vice versa) is subtly error-prone:
So I think fixing this would actually encourage bugs.
Furthermore, IMO, using reinterpret
on non-primitive types should perhaps be forbidden in Julia v2.
the result depends on the host endianess
This doesn't matter that much because at this point everything is little-endian.
Here is a benchmark comparing reinterpret
against a manual LLVM version (modeled after actual code in Random/src/XoshiroSimd.jl
) in the v1.10.0 release:
julia> using BenchmarkTools
julia> @eval tupcast(::Type{NTuple{8,UInt32}}, x::NTuple{4,UInt64}) = map(z->z.value, Core.Intrinsics.llvmcall("%res = bitcast <4 x i64> %0 to <8 x i32>\nret <8 x i32> %res", NTuple{8,Core.VecElement{UInt32}}, Tuple{NTuple{4,Core.VecElement{UInt64}}}, map(Core.VecElement,x)));
julia> x = [ntuple(_->rand(UInt64), Val(4)) for _ in 1:2^10]; y = similar(x, NTuple{8,UInt32});
julia> @btime $y .= reinterpret.(NTuple{8,UInt32}, $x);
2.400 μs (0 allocations: 0 bytes)
julia> @btime $y .= tupcast.(NTuple{8,UInt32}, $x);
1.180 μs (0 allocations: 0 bytes)
julia> z = similar(x);
julia> @btime $z .= reinterpret.(NTuple{4,UInt64}, $x); # reinterpret is a no-op
2.311 μs (0 allocations: 0 bytes)
julia> @btime $z .= $x; # compare to copy
534.896 ns (0 allocations: 0 bytes)
Note that tupcast
achieves 2x the throughput on my x86 machine, relative to reinterpret
. Also notice that the no-op case reinterpret(::Type{T}, x::T) where T
is causing a 4x slowdown relative to a direct copy. From the @code_native
, it appears that reinterpret
is indeed not being inlined, as suggested in the original issue.
I've been interested in this functionality for SIMD generation of random numbers requiring further computations. In that case, all the bits of the input tuple are already random so I'm happy to reinterpret them without regard to endianness etc.
A simple fix for this struct_subpading
case is replacing memcpy
with unsafe_load
, just like #43955. The effect inference should still be bad but at least our compiler would like to inline it for this simple case.
It calls into
_reintepret
which doesunsace_convert
, which julia does not like. This causes this code which will compile down to very simple instructions to not get either inlined or const propped.