JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.61k stars 5.48k forks source link

Reinterpret with `NTuples`and types has bad effects, maybe unnecessarily #52047

Open gbaraldi opened 11 months ago

gbaraldi commented 11 months ago

It calls into _reintepret which does unsace_convert, which julia does not like. This causes this code which will compile down to very simple instructions to not get either inlined or const propped.

Seelengrab commented 11 months ago

Did you mean to link to some piece of code here?

nsajko commented 11 months ago

some piece of code

julia> f(n::UInt32) = reinterpret(NTuple{4,UInt8}, n)
f (generic function with 1 method)

julia> f(0x12345678)
(0x78, 0x56, 0x34, 0x12)

julia> Base.infer_effects(f, Tuple{UInt32})
(!c,!e,!n,!t,!s,!m,!u)′

julia> versioninfo()
Julia Version 1.11.0-DEV.856
Commit e7345b89fd4 (2023-11-07 02:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × AMD Ryzen 3 5300U with Radeon Graphics
  WORD_SIZE: 64
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 11 on 8 virtual cores
Environment:
  JULIA_NUM_PRECOMPILE_TASKS = 3
  JULIA_PKG_PRECOMPILE_AUTO = 0
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 8

This could just be implemented with a couple of cheap shifts, so I guess Julia should use that approach, too?

nsajko commented 11 months ago

On second thought, fixing this might not be worth it, or actually it might even be preferable to leave it as is. This is because reinterpreting tuples as primitive integers (or vice versa) is subtly error-prone:

  1. In almost all cases, the result depends on the host endianess.
  2. In the case of heterogeneous tuples, the result may depend on the padding, possibly involving undefined behavior, ref #29605, #41071.

So I think fixing this would actually encourage bugs.

Furthermore, IMO, using reinterpret on non-primitive types should perhaps be forbidden in Julia v2.

oscardssmith commented 11 months ago

the result depends on the host endianess

This doesn't matter that much because at this point everything is little-endian.

mikmoore commented 9 months ago

Here is a benchmark comparing reinterpret against a manual LLVM version (modeled after actual code in Random/src/XoshiroSimd.jl) in the v1.10.0 release:

julia> using BenchmarkTools

julia> @eval tupcast(::Type{NTuple{8,UInt32}}, x::NTuple{4,UInt64}) = map(z->z.value, Core.Intrinsics.llvmcall("%res = bitcast <4 x i64> %0 to <8 x i32>\nret <8 x i32> %res", NTuple{8,Core.VecElement{UInt32}}, Tuple{NTuple{4,Core.VecElement{UInt64}}}, map(Core.VecElement,x)));

julia> x = [ntuple(_->rand(UInt64), Val(4)) for _ in 1:2^10]; y = similar(x, NTuple{8,UInt32});

julia> @btime $y .= reinterpret.(NTuple{8,UInt32}, $x);
  2.400 μs (0 allocations: 0 bytes)

julia> @btime $y .= tupcast.(NTuple{8,UInt32}, $x);
  1.180 μs (0 allocations: 0 bytes)

julia> z = similar(x);

julia> @btime $z .= reinterpret.(NTuple{4,UInt64}, $x); # reinterpret is a no-op
  2.311 μs (0 allocations: 0 bytes)

julia> @btime $z .= $x; # compare to copy
  534.896 ns (0 allocations: 0 bytes)

Note that tupcast achieves 2x the throughput on my x86 machine, relative to reinterpret. Also notice that the no-op case reinterpret(::Type{T}, x::T) where T is causing a 4x slowdown relative to a direct copy. From the @code_native, it appears that reinterpret is indeed not being inlined, as suggested in the original issue.


I've been interested in this functionality for SIMD generation of random numbers requiring further computations. In that case, all the bits of the input tuple are already random so I'm happy to reinterpret them without regard to endianness etc.

N5N3 commented 9 months ago

A simple fix for this struct_subpading case is replacing memcpy with unsafe_load, just like #43955. The effect inference should still be bad but at least our compiler would like to inline it for this simple case.