JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.43k stars 5.45k forks source link

tuples of symbols don't get vectorized #55690

Open matthias314 opened 1 week ago

matthias314 commented 1 week ago

I have noticed that while tuples of bit integers (and Char) get vectorized, tuples with Symbol elements won't. See the example below. I'm not an LLVM expert, but I would think that vectorization should be possible in this case, too. Is it intended behavior that it's not done?

julia> f(t1, t2) = reduce(&, map(==, t1, t2));

julia> t = (1, 2, 3, 4); @code_llvm debuginfo=:none f(t, t)
; Function Signature: f(NTuple{4, Int64}, NTuple{4, Int64})
define i8 @julia_f_1742(ptr nocapture noundef nonnull readonly align 8 dereferenceable(32) %"t1::Tuple", ptr nocapture noundef nonnull readonly align 8 dereferenceable(32) %"t2::Tuple") #0 {
top:
  %0 = load <4 x i64>, ptr %"t1::Tuple", align 8
  %1 = load <4 x i64>, ptr %"t2::Tuple", align 8
  %2 = icmp ne <4 x i64> %0, %1
  %3 = bitcast <4 x i1> %2 to i4
  %4 = icmp eq i4 %3, 0
  %5 = zext i1 %4 to i8
  ret i8 %5
}

julia> t = (:a, :b, :c, :d); @code_llvm debuginfo=:none f(t, t)
; Function Signature: f(NTuple{4, Symbol}, NTuple{4, Symbol})
define i8 @julia_f_1748(ptr nocapture noundef nonnull readonly align 8 dereferenceable(32) %"t1::Tuple", ptr nocapture noundef nonnull readonly align 8 dereferenceable(32) %"t2::Tuple") #0 {
top:
  %"t1::Tuple[1]" = load atomic ptr, ptr %"t1::Tuple" unordered, align 8
  %"t2::Tuple[1]" = load atomic ptr, ptr %"t2::Tuple" unordered, align 8
  %0 = icmp eq ptr %"t1::Tuple[1]", %"t2::Tuple[1]"
  %"t1::Tuple[2]_ptr" = getelementptr inbounds [4 x ptr], ptr %"t1::Tuple", i64 0, i64 1
  %"t1::Tuple[2]" = load atomic ptr, ptr %"t1::Tuple[2]_ptr" unordered, align 8
  %"t1::Tuple[3]_ptr" = getelementptr inbounds [4 x ptr], ptr %"t1::Tuple", i64 0, i64 2
  %"t1::Tuple[3]" = load atomic ptr, ptr %"t1::Tuple[3]_ptr" unordered, align 8
  %"t1::Tuple[4]_ptr" = getelementptr inbounds [4 x ptr], ptr %"t1::Tuple", i64 0, i64 3
  %"t1::Tuple[4]" = load atomic ptr, ptr %"t1::Tuple[4]_ptr" unordered, align 8
  %"t2::Tuple[2]_ptr" = getelementptr inbounds [4 x ptr], ptr %"t2::Tuple", i64 0, i64 1
  %"t2::Tuple[2]" = load atomic ptr, ptr %"t2::Tuple[2]_ptr" unordered, align 8
  %"t2::Tuple[3]_ptr" = getelementptr inbounds [4 x ptr], ptr %"t2::Tuple", i64 0, i64 2
  %"t2::Tuple[3]" = load atomic ptr, ptr %"t2::Tuple[3]_ptr" unordered, align 8
  %"t2::Tuple[4]_ptr" = getelementptr inbounds [4 x ptr], ptr %"t2::Tuple", i64 0, i64 3
  %"t2::Tuple[4]" = load atomic ptr, ptr %"t2::Tuple[4]_ptr" unordered, align 8
  %1 = icmp eq ptr %"t1::Tuple[2]", %"t2::Tuple[2]"
  %2 = icmp eq ptr %"t1::Tuple[3]", %"t2::Tuple[3]"
  %3 = icmp eq ptr %"t1::Tuple[4]", %"t2::Tuple[4]"
  %4 = and i1 %0, %1
  %5 = and i1 %4, %2
  %6 = and i1 %5, %3
  %7 = zext i1 %6 to i8
  ret i8 %7
}

I've tried both Julia 1.10.4 and master.

vtjnash commented 1 week ago

It is probably a missing feature in LLVM, as it can memcpy them correctly with a vectorized instruction (IRBuilder::CreateElementUnorderedAtomicMemCpy) but is has no matching instruction able to load them