Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[OpenCL] Invalid optimization of subgroup functions with non-uniform control flow #45169

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR46199
Status CONFIRMED
Importance P enhancement
Reported by Piotr Fusik (piotr.fusik@intel.com)
Reported on 2020-06-04 08:11:20 -0700
Last modified on 2021-11-05 04:17:49 -0700
Version trunk
Hardware PC Windows NT
CC anastasia.stulova@arm.com, kevin.petit@arm.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

The following OpenCL code:

kernel void test(global int *data) { uint id = (uint) get_global_id(0); if (id < 4) data[id] = sub_group_elect(); else data[id] = sub_group_elect(); }

when compiled with the current master (f2c97656644e783622a6e60fe452b41ffe0f1d18) as follows: clang -cl-std=CL2.0 -include opencl-c.h -S -emit-llvm sub_group_elect_opt.cl

has an invalid optimization of combining the branches:

; Function Attrs: convergent norecurse nounwind uwtable define dso_local spir_kernel void @test(i32 nocapture %data) local_unnamed_addr #0 !kernel_arg_addr_space !4 !kernel_arg_access_qual !5 !kernel_arg_type !6 !kernel_arg_base_type !6 !kernel_arg_type_qual !7 { entry: %call = tail call i64 @"?get_global_id@@$$J0YAKI@Z"(i32 0) #3 %call2 = tail call i32 @"?sub_group_elect@@$$J0YAHXZ"() #4 %idxprom = and i64 %call, 4294967295 %arrayidx = getelementptr inbounds i32, i32 %data, i64 %idxprom store i32 %call2, i32* %arrayidx, align 4, !tbaa !8 ret void }

sub_group_elect is one of the many functions added in https://reviews.llvm.org/D79781 that perform implicit communication within a subgroup, i.e. threads implemented as SIMD on GPU. All the added functions are affected.

https://reviews.llvm.org/D68994 is a proposal of addressing this problem with the convergent attribute.

Quuxplusone commented 4 years ago
Marking subgroups operations that are to be called in non-uniform control flow
with convergent attribute doesn't help because convergent attribute was added
for functions called within uniform CF (by all work items):
https://clang.llvm.org/docs/AttributeReference.html#convergent