Closed jgreener64 closed 3 months ago
With debug info:
Cannot create a null constant of that type!
UNREACHABLE executed at /home/wmoses/git/Enzyme.jl/julia10/deps/srccache/llvm-julia-15.0.7-10/llvm/lib/IR/Constants.cpp:374!
[3398190] signal (6.-6): Aborted
in expression starting at /home/wmoses/git/Enzyme.jl/cubc.jl:6
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
llvm_unreachable_internal at /home/wmoses/git/Enzyme.jl/julia10/deps/srccache/llvm-julia-15.0.7-10/llvm/lib/Support/ErrorHandling.cpp:212
getNullValue at /home/wmoses/git/Enzyme.jl/julia10/deps/srccache/llvm-julia-15.0.7-10/llvm/lib/IR/Constants.cpp:374
handleAdjointForIntrinsic at /home/wmoses/.julia/scratchspaces/7cc45869-7501-5eee-bdea-0790c847d4ef/src/Enzyme/enzyme/Enzyme/AdjointGenerator.h:4023
visitIntrinsicInst at /home/wmoses/.julia/scratchspaces/7cc45869-7501-5eee-bdea-0790c847d4ef/src/Enzyme/enzyme/Enzyme/AdjointGenerator.h:3696
@jgreener64 the forward mode assertion should no longer err that way. It for some reason has a size mismatch error come up though.
If you have cycles, some minimization through the broadcast impl would definitely be helpful here.
The forward mode error is
ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 5 and 5
Stacktrace:
[1] _bcs1
@ ./broadcast.jl:555 [inlined]
[2] _bcs
@ ./broadcast.jl:549 [inlined]
[3] broadcast_shape
@ ./broadcast.jl:543 [inlined]
[4] combine_axes
@ ./broadcast.jl:524 [inlined]
[5] instantiate
@ ./broadcast.jl:306 [inlined]
[6] materialize
@ ./broadcast.jl:903 [inlined]
[7] f
@ ./REPL[3]:1 [inlined]
[8] fwddiffejulia_f_4514wrap
@ ./REPL[3]:0
[9] macro expansion
@ ~/.julia/dev/Enzyme/src/compiler.jl:5916 [inlined]
[10] enzyme_call
@ ~/.julia/dev/Enzyme/src/compiler.jl:5566 [inlined]
[11] ForwardModeThunk
@ ~/.julia/dev/Enzyme/src/compiler.jl:5446 [inlined]
[12] autodiff
@ ~/.julia/dev/Enzyme/src/Enzyme.jl:399 [inlined]
[13] autodiff(::ForwardMode{FFIABI}, ::typeof(f), ::Type{Duplicated}, ::Duplicated{CuArray{…}}, ::Const{CuArray{…}})
@ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:303
[14] top-level scope
@ REPL[9]:1
[15] top-level scope
@ ~/.julia/dev/CUDA/src/initialization.jl:209
Some type information was truncated. Use `show(err)` to see complete types.
which certainly seems a strange one. Adding
println("a ", typeof(a), " ", a, " b ", typeof(b), " ", b, " ", a == b)
to the start of Base.Broadcast._bcs1(a, b)
(https://github.com/JuliaLang/julia/blob/0b4590a5507d3f3046e5bafc007cacbbfc9b310b/base/broadcast.jl#L555), where a
and b
are the dimension sizes of the broadcasted arrays, gives the following for the primal function:
f(x, y)
a Base.OneTo{Int64} Base.OneTo(5) b Base.OneTo{Int64} Base.OneTo(5) true
a Base.OneTo{Int64} Base.OneTo(1) b Base.OneTo{Int64} Base.OneTo(1) true
a Base.OneTo{Int64} Base.OneTo(1) b Base.OneTo{Int64} Base.OneTo(1) true
5.617565028176828
and this for the gradient:
autodiff(Forward, f, Duplicated, Duplicated(x, dx), Const(y))
a Base.OneTo{Int64} Base.OneTo(5) b Base.OneTo{Int64} Base.OneTo(5) true
[error as above]
The error is effectively thrown when a != b
, but the failure case doesn't seem to print. I wonder if Enzyme does some conversion of the types of the dimension sizes such that a == b
no longer holds.
I also tried Infiltrator.jl and Debugger.jl but didn't have much luck.
@jgreener64 is that still the case, I thought on main that should now be fixed
Still the case for me on 21b0762d with CUDA 5.3.4 and Julia 1.10.3.
Is broadcasting with CUDA+Enzyme still a general issue or not?
I've got some multidimensional code which works with CPU but not CUDA
It doesn't work at the minute from what I can tell.
With Enzyme 0.12.25, CUDA 5.4.2 and Julia 1.10.3 the forward mode error above has changed to:
warning found shared memory
ERROR: a Enzyme compilation failed.
Current scope:
; Function Attrs: mustprogress willreturn
define void @preprocess_julia_partial_mapreduce_grid_48233_inner7(double "enzyme_type"="{[-1]:Float@double}" "enzymejl_parmtype"="140238631732192" "enzymejl_parmtype_ref"="0" %0, [1 x [1 x [1 x i64]]] "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="140238577761536" "enzymejl_parmtype_ref"="0" %1, [1 x [1 x [1 x i64]]] "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="140238577761536" "enzymejl_parmtype_ref"="0" %2, { i8 addrspace(1)*, i64, [2 x i64], i64 } "enzyme_type"="{[0]:Pointer, [0,-1]:Float@double, [8]:Integer, [9]:Integer, [10]:Integer, [11]:Integer, [12]:Integer, [13]:Integer, [14]:Integer, [15]:Integer, [16]:Integer, [17]:Integer, [18]:Integer, [19]:Integer, [20]:Integer, [21]:Integer, [22]:Integer, [23]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer}" "enzymejl_parmtype"="140233465998160" "enzymejl_parmtype_ref"="0" %3, { [1 x { i8 addrspace(1)*, i64, [1 x i64], i64 }], [1 x [1 x i64]] } "enzyme_type"="{[0]:Pointer, [0,-1]:Float@double, [8]:Integer, [9]:Integer, [10]:Integer, [11]:Integer, [12]:Integer, [13]:Integer, [14]:Integer, [15]:Integer, [16]:Integer, [17]:Integer, [18]:Integer, [19]:Integer, [20]:Integer, [21]:Integer, [22]:Integer, [23]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer}" "enzymejl_parmtype"="140238766152592" "enzymejl_parmtype_ref"="0" %4) local_unnamed_addr #12 !dbg !484 {
entry:
%5 = alloca [2 x i64], align 8
%.fca.0.0.0.extract13 = extractvalue [1 x [1 x [1 x i64]]] %1, 0, 0, 0, !dbg !485, !enzyme_inactive !12
%.fca.0.0.0.extract9 = extractvalue [1 x [1 x [1 x i64]]] %2, 0, 0, 0, !dbg !485, !enzyme_inactive !12
%.fca.0.extract = extractvalue { i8 addrspace(1)*, i64, [2 x i64], i64 } %3, 0, !dbg !485
%.fca.2.0.extract = extractvalue { i8 addrspace(1)*, i64, [2 x i64], i64 } %3, 2, 0, !dbg !485, !enzyme_inactive !12
%.fca.0.0.0.extract = extractvalue { [1 x { i8 addrspace(1)*, i64, [1 x i64], i64 }], [1 x [1 x i64]] } %4, 0, 0, 0, !dbg !485
%.fca.0.0.2.0.extract = extractvalue { [1 x { i8 addrspace(1)*, i64, [1 x i64], i64 }], [1 x [1 x i64]] } %4, 0, 0, 2, 0, !dbg !485, !enzyme_inactive !12
%6 = bitcast [2 x i64]* %5 to i8*
call void @llvm.lifetime.start.p0i8(i64 noundef 16, i8* noundef nonnull align 8 dereferenceable(16) %6) #13
%7 = call {}*** @julia.get_pgcstack() #13
%8 = icmp sgt i64 %.fca.0.0.0.extract9, 0, !dbg !486
call void @llvm.assume(i1 noundef %8) #13, !dbg !490
%9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #13, !dbg !491, !range !117
%10 = add nuw nsw i32 %9, 1, !dbg !497
%11 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() #13, !dbg !498, !range !128
%12 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #13, !dbg !504, !range !137
%13 = add nuw nsw i32 %12, 1, !dbg !510
call fastcc void @julia_fldmod1_48281([2 x i64]* noalias nocapture nofree noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %5, i32 signext %13, i64 signext %.fca.0.0.0.extract9) #13, !dbg !511
%14 = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() #13, !dbg !512, !range !148
%15 = zext i32 %14 to i64, !dbg !518
%16 = getelementptr inbounds [2 x i64], [2 x i64]* %5, i64 0, i64 1, !dbg !525
%17 = udiv i64 %15, %.fca.0.0.0.extract9, !dbg !526
%unbox3.i = load i64, i64* %16, align 8, !dbg !527, !tbaa !170, !alias.scope !174, !noalias !177
%.not = icmp sgt i64 %unbox3.i, %.fca.0.0.0.extract9, !dbg !527
br i1 %.not, label %julia_partial_mapreduce_grid_48233_inner.exit, label %L49.i, !dbg !529
L49.i: ; preds = %entry
%18 = getelementptr inbounds [2 x i64], [2 x i64]* %5, i64 0, i64 0, !dbg !530
%19 = fadd double %0, %0, !dbg !532
%unbox11.i = load i64, i64* %18, align 8, !dbg !535, !tbaa !170, !alias.scope !174, !noalias !177
%20 = add i64 %unbox11.i, -1, !dbg !535
%21 = zext i32 %11 to i64, !dbg !537
%22 = mul i64 %20, %21, !dbg !539
%23 = zext i32 %10 to i64, !dbg !541
%24 = add i64 %22, %23, !dbg !543
%.not3647 = icmp sgt i64 %24, %.fca.0.0.0.extract13, !dbg !545
br i1 %.not3647, label %L203.i, label %L97.i.lr.ph, !dbg !547
L97.i.lr.ph: ; preds = %L49.i
%25 = call i64 @llvm.smax.i64(i64 %.fca.0.0.2.0.extract, i64 noundef 0) #13, !dbg !485
%.not38 = icmp eq i64 %25, 1
%26 = bitcast i8 addrspace(1)* %.fca.0.0.0.extract to double addrspace(1)*
%27 = mul nuw nsw i64 %17, %21
%28 = mul i64 %17, %21, !dbg !548
br label %L97.i, !dbg !547
L97.i: ; preds = %L97.i, %L97.i.lr.ph
%iv = phi i64 [ %iv.next, %L97.i ], [ 0, %L97.i.lr.ph ]
%value_phi12.i49 = phi double [ %19, %L97.i.lr.ph ], [ %35, %L97.i ]
%29 = mul i64 %28, %iv, !dbg !548
%iv.next = add nuw nsw i64 %iv, 1, !dbg !548
%30 = add i64 %24, %29, !dbg !548
%31 = call i64 @llvm.smax.i64(i64 %30, i64 %unbox3.i) #13, !dbg !548
%.op = add i64 %31, -1, !dbg !553
%32 = select i1 %.not38, i64 0, i64 %.op, !dbg !553
%33 = getelementptr inbounds double, double addrspace(1)* %26, i64 %32, !dbg !568
%34 = load double, double addrspace(1)* %33, align 8, !dbg !568, !tbaa !248
%35 = fadd double %value_phi12.i49, %34, !dbg !569
%36 = add i64 %30, %27, !dbg !571
%.not36 = icmp sgt i64 %36, %.fca.0.0.0.extract13, !dbg !545
br i1 %.not36, label %L203.i.loopexit, label %L97.i, !dbg !547
L203.i.loopexit: ; preds = %L97.i
br label %L203.i, !dbg !573
L203.i: ; preds = %L203.i.loopexit, %L49.i
%value_phi12.i.lcssa = phi double [ %19, %L49.i ], [ %35, %L203.i.loopexit ]
%37 = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #13, !dbg !573
%38 = icmp eq i32 %37, 32, !dbg !576
call void @llvm.assume(i1 noundef %38) #13, !dbg !578
%39 = call fastcc [2 x i32] @julia_fldmod1_48268(i32 signext %10) #14, !dbg !579
%.fca.0.extract14 = extractvalue [2 x i32] %39, 0, !dbg !579
%bitcast_coercion.i = bitcast double %value_phi12.i.lcssa to i64, !dbg !580
%40 = lshr i64 %bitcast_coercion.i, 32, !dbg !586
%41 = trunc i64 %40 to i32, !dbg !589
%42 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %41, i32 noundef 1, i32 noundef 31) #13, !dbg !590
%43 = zext i32 %42 to i64, !dbg !592
%44 = shl nuw i64 %43, 32, !dbg !594
%45 = trunc i64 %bitcast_coercion.i to i32, !dbg !589
%46 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %45, i32 noundef 1, i32 noundef 31) #13, !dbg !590
%47 = zext i32 %46 to i64, !dbg !596
%48 = or i64 %44, %47, !dbg !601
%bitcast_coercion35.i = bitcast i64 %48 to double, !dbg !580
%49 = fadd double %value_phi12.i.lcssa, %bitcast_coercion35.i, !dbg !603
%bitcast_coercion.i.1 = bitcast double %49 to i64, !dbg !580
%50 = lshr i64 %bitcast_coercion.i.1, 32, !dbg !586
%51 = trunc i64 %50 to i32, !dbg !589
%52 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %51, i32 noundef 2, i32 noundef 31) #13, !dbg !590
%53 = zext i32 %52 to i64, !dbg !592
%54 = shl nuw i64 %53, 32, !dbg !594
%55 = trunc i64 %bitcast_coercion.i.1 to i32, !dbg !589
%56 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %55, i32 noundef 2, i32 noundef 31) #13, !dbg !590
%57 = zext i32 %56 to i64, !dbg !596
%58 = or i64 %54, %57, !dbg !601
%bitcast_coercion35.i.1 = bitcast i64 %58 to double, !dbg !580
%59 = fadd double %49, %bitcast_coercion35.i.1, !dbg !603
%bitcast_coercion.i.2 = bitcast double %59 to i64, !dbg !580
%60 = lshr i64 %bitcast_coercion.i.2, 32, !dbg !586
%61 = trunc i64 %60 to i32, !dbg !589
%62 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %61, i32 noundef 4, i32 noundef 31) #13, !dbg !590
%63 = zext i32 %62 to i64, !dbg !592
%64 = shl nuw i64 %63, 32, !dbg !594
%65 = trunc i64 %bitcast_coercion.i.2 to i32, !dbg !589
%66 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %65, i32 noundef 4, i32 noundef 31) #13, !dbg !590
%67 = zext i32 %66 to i64, !dbg !596
%68 = or i64 %64, %67, !dbg !601
%bitcast_coercion35.i.2 = bitcast i64 %68 to double, !dbg !580
%69 = fadd double %59, %bitcast_coercion35.i.2, !dbg !603
%bitcast_coercion.i.3 = bitcast double %69 to i64, !dbg !580
%70 = lshr i64 %bitcast_coercion.i.3, 32, !dbg !586
%71 = trunc i64 %70 to i32, !dbg !589
%72 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %71, i32 noundef 8, i32 noundef 31) #13, !dbg !590
%73 = zext i32 %72 to i64, !dbg !592
%74 = shl nuw i64 %73, 32, !dbg !594
%75 = trunc i64 %bitcast_coercion.i.3 to i32, !dbg !589
%76 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %75, i32 noundef 8, i32 noundef 31) #13, !dbg !590
%77 = zext i32 %76 to i64, !dbg !596
%78 = or i64 %74, %77, !dbg !601
%bitcast_coercion35.i.3 = bitcast i64 %78 to double, !dbg !580
%79 = fadd double %69, %bitcast_coercion35.i.3, !dbg !603
%bitcast_coercion.i.4 = bitcast double %79 to i64, !dbg !580
%80 = lshr i64 %bitcast_coercion.i.4, 32, !dbg !586
%81 = trunc i64 %80 to i32, !dbg !589
%82 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %81, i32 noundef 16, i32 noundef 31) #13, !dbg !590
%83 = trunc i64 %bitcast_coercion.i.4 to i32, !dbg !589
%84 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %83, i32 noundef 16, i32 noundef 31) #13, !dbg !590
%.fca.1.extract15 = extractvalue [2 x i32] %39, 1, !dbg !579
%.not39 = icmp eq i32 %.fca.1.extract15, 1, !dbg !605
br i1 %.not39, label %L254.i, label %L273.i, !dbg !608
L254.i: ; preds = %L203.i
%85 = zext i32 %82 to i64, !dbg !592
%86 = shl nuw i64 %85, 32, !dbg !594
%87 = zext i32 %84 to i64, !dbg !596
%88 = or i64 %86, %87, !dbg !601
%bitcast_coercion35.i.4 = bitcast i64 %88 to double, !dbg !580
%89 = fadd double %79, %bitcast_coercion35.i.4, !dbg !603
%90 = add i32 %.fca.0.extract14, -1, !dbg !609
%91 = sext i32 %90 to i64, !dbg !618
%92 = getelementptr inbounds double, double addrspace(3)* bitcast ([256 x i8] addrspace(3)* @shmem to double addrspace(3)*), i64 %91, !dbg !618
store double %89, double addrspace(3)* %92, align 8, !dbg !618, !tbaa !320, !noalias !619
br label %L273.i, !dbg !622
L273.i: ; preds = %L254.i, %L203.i
call void @llvm.nvvm.barrier0() #13, !dbg !623
%93 = lshr i32 %11, 5, !dbg !625
%94 = and i32 %11, 992, !dbg !628
%95 = icmp ne i32 %94, %11, !dbg !630
%96 = zext i1 %95 to i32, !dbg !632
%97 = add nuw nsw i32 %93, %96, !dbg !637
%.not40.not = icmp ult i32 %9, %97, !dbg !639
br i1 %.not40.not, label %L302.i, label %L322.i, !dbg !640
L302.i: ; preds = %L273.i
%98 = add i32 %.fca.1.extract15, -1, !dbg !641
%99 = sext i32 %98 to i64, !dbg !650
%100 = getelementptr inbounds double, double addrspace(3)* bitcast ([256 x i8] addrspace(3)* @shmem to double addrspace(3)*), i64 %99, !dbg !650
%101 = load double, double addrspace(3)* %100, align 8, !dbg !650, !tbaa !320
br label %L322.i, !dbg !650
L322.i: ; preds = %L302.i, %L273.i
%value_phi46.i = phi double [ %101, %L302.i ], [ %0, %L273.i ]
%.not41 = icmp eq i32 %.fca.0.extract14, 1, !dbg !651
br i1 %.not41, label %L331.i.preheader, label %L357.i, !dbg !654
L331.i.preheader: ; preds = %L322.i
%bitcast_coercion50.i = bitcast double %value_phi46.i to i64, !dbg !655
%102 = lshr i64 %bitcast_coercion50.i, 32, !dbg !661
%103 = trunc i64 %102 to i32, !dbg !664
%104 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %103, i32 noundef 1, i32 noundef 31) #13, !dbg !665
%105 = zext i32 %104 to i64, !dbg !667
%106 = shl nuw i64 %105, 32, !dbg !669
%107 = trunc i64 %bitcast_coercion50.i to i32, !dbg !664
%108 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %107, i32 noundef 1, i32 noundef 31) #13, !dbg !665
%109 = zext i32 %108 to i64, !dbg !671
%110 = or i64 %106, %109, !dbg !676
%bitcast_coercion51.i = bitcast i64 %110 to double, !dbg !655
%111 = fadd double %value_phi46.i, %bitcast_coercion51.i, !dbg !678
%bitcast_coercion50.i.1 = bitcast double %111 to i64, !dbg !655
%112 = lshr i64 %bitcast_coercion50.i.1, 32, !dbg !661
%113 = trunc i64 %112 to i32, !dbg !664
%114 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %113, i32 noundef 2, i32 noundef 31) #13, !dbg !665
%115 = zext i32 %114 to i64, !dbg !667
%116 = shl nuw i64 %115, 32, !dbg !669
%117 = trunc i64 %bitcast_coercion50.i.1 to i32, !dbg !664
%118 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %117, i32 noundef 2, i32 noundef 31) #13, !dbg !665
%119 = zext i32 %118 to i64, !dbg !671
%120 = or i64 %116, %119, !dbg !676
%bitcast_coercion51.i.1 = bitcast i64 %120 to double, !dbg !655
%121 = fadd double %111, %bitcast_coercion51.i.1, !dbg !678
%bitcast_coercion50.i.2 = bitcast double %121 to i64, !dbg !655
%122 = lshr i64 %bitcast_coercion50.i.2, 32, !dbg !661
%123 = trunc i64 %122 to i32, !dbg !664
%124 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %123, i32 noundef 4, i32 noundef 31) #13, !dbg !665
%125 = zext i32 %124 to i64, !dbg !667
%126 = shl nuw i64 %125, 32, !dbg !669
%127 = trunc i64 %bitcast_coercion50.i.2 to i32, !dbg !664
%128 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %127, i32 noundef 4, i32 noundef 31) #13, !dbg !665
%129 = zext i32 %128 to i64, !dbg !671
%130 = or i64 %126, %129, !dbg !676
%bitcast_coercion51.i.2 = bitcast i64 %130 to double, !dbg !655
%131 = fadd double %121, %bitcast_coercion51.i.2, !dbg !678
%bitcast_coercion50.i.3 = bitcast double %131 to i64, !dbg !655
%132 = lshr i64 %bitcast_coercion50.i.3, 32, !dbg !661
%133 = trunc i64 %132 to i32, !dbg !664
%134 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %133, i32 noundef 8, i32 noundef 31) #13, !dbg !665
%135 = zext i32 %134 to i64, !dbg !667
%136 = shl nuw i64 %135, 32, !dbg !669
%137 = trunc i64 %bitcast_coercion50.i.3 to i32, !dbg !664
%138 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %137, i32 noundef 8, i32 noundef 31) #13, !dbg !665
%139 = zext i32 %138 to i64, !dbg !671
%140 = or i64 %136, %139, !dbg !676
%bitcast_coercion51.i.3 = bitcast i64 %140 to double, !dbg !655
%141 = fadd double %131, %bitcast_coercion51.i.3, !dbg !678
%bitcast_coercion50.i.4 = bitcast double %141 to i64, !dbg !655
%142 = lshr i64 %bitcast_coercion50.i.4, 32, !dbg !661
%143 = trunc i64 %142 to i32, !dbg !664
%144 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %143, i32 noundef 16, i32 noundef 31) #13, !dbg !665
%145 = zext i32 %144 to i64, !dbg !667
%146 = shl nuw i64 %145, 32, !dbg !669
%147 = trunc i64 %bitcast_coercion50.i.4 to i32, !dbg !664
%148 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %147, i32 noundef 16, i32 noundef 31) #13, !dbg !665
%149 = zext i32 %148 to i64, !dbg !671
%150 = or i64 %146, %149, !dbg !676
%bitcast_coercion51.i.4 = bitcast i64 %150 to double, !dbg !655
%151 = fadd double %141, %bitcast_coercion51.i.4, !dbg !678
br label %L357.i, !dbg !680
L357.i: ; preds = %L331.i.preheader, %L322.i
%value_phi52.i = phi double [ %value_phi46.i, %L322.i ], [ %151, %L331.i.preheader ]
%.not42 = icmp eq i32 %9, 0, !dbg !680
br i1 %.not42, label %L362.i, label %julia_partial_mapreduce_grid_48233_inner.exit, !dbg !683
L362.i: ; preds = %L357.i
%152 = call i64 @llvm.smax.i64(i64 %.fca.2.0.extract, i64 noundef 0) #13, !dbg !684
%153 = mul i64 %20, %152, !dbg !695
%154 = add i64 %unbox3.i, -1, !dbg !699
%155 = add i64 %154, %153, !dbg !700
%156 = bitcast i8 addrspace(1)* %.fca.0.extract to double addrspace(1)*, !dbg !708
%157 = getelementptr inbounds double, double addrspace(1)* %156, i64 %155, !dbg !708
store double %value_phi52.i, double addrspace(1)* %157, align 8, !dbg !708, !tbaa !248, !noalias !619
br label %julia_partial_mapreduce_grid_48233_inner.exit, !dbg !709
julia_partial_mapreduce_grid_48233_inner.exit: ; preds = %L362.i, %L357.i, %entry
call void @llvm.lifetime.end.p0i8(i64 noundef 16, i8* noundef nonnull %6) #13, !dbg !710
ret void, !dbg !485
}
; Function Attrs: mustprogress willreturn
define void @preprocess_julia_partial_mapreduce_grid_48233_inner7(double "enzyme_type"="{[-1]:Float@double}" "enzymejl_parmtype"="140238631732192" "enzymejl_parmtype_ref"="0" %0, [1 x [1 x [1 x i64]]] "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="140238577761536" "enzymejl_parmtype_ref"="0" %1, [1 x [1 x [1 x i64]]] "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="140238577761536" "enzymejl_parmtype_ref"="0" %2, { i8 addrspace(1)*, i64, [2 x i64], i64 } "enzyme_type"="{[0]:Pointer, [0,-1]:Float@double, [8]:Integer, [9]:Integer, [10]:Integer, [11]:Integer, [12]:Integer, [13]:Integer, [14]:Integer, [15]:Integer, [16]:Integer, [17]:Integer, [18]:Integer, [19]:Integer, [20]:Integer, [21]:Integer, [22]:Integer, [23]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer}" "enzymejl_parmtype"="140233465998160" "enzymejl_parmtype_ref"="0" %3, { [1 x { i8 addrspace(1)*, i64, [1 x i64], i64 }], [1 x [1 x i64]] } "enzyme_type"="{[0]:Pointer, [0,-1]:Float@double, [8]:Integer, [9]:Integer, [10]:Integer, [11]:Integer, [12]:Integer, [13]:Integer, [14]:Integer, [15]:Integer, [16]:Integer, [17]:Integer, [18]:Integer, [19]:Integer, [20]:Integer, [21]:Integer, [22]:Integer, [23]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer}" "enzymejl_parmtype"="140238766152592" "enzymejl_parmtype_ref"="0" %4) local_unnamed_addr #12 !dbg !484 {
entry:
%5 = alloca [2 x i64], align 8
%.fca.0.0.0.extract13 = extractvalue [1 x [1 x [1 x i64]]] %1, 0, 0, 0, !dbg !485, !enzyme_inactive !12
%.fca.0.0.0.extract9 = extractvalue [1 x [1 x [1 x i64]]] %2, 0, 0, 0, !dbg !485, !enzyme_inactive !12
%.fca.0.extract = extractvalue { i8 addrspace(1)*, i64, [2 x i64], i64 } %3, 0, !dbg !485
%.fca.2.0.extract = extractvalue { i8 addrspace(1)*, i64, [2 x i64], i64 } %3, 2, 0, !dbg !485, !enzyme_inactive !12
%.fca.0.0.0.extract = extractvalue { [1 x { i8 addrspace(1)*, i64, [1 x i64], i64 }], [1 x [1 x i64]] } %4, 0, 0, 0, !dbg !485
%.fca.0.0.2.0.extract = extractvalue { [1 x { i8 addrspace(1)*, i64, [1 x i64], i64 }], [1 x [1 x i64]] } %4, 0, 0, 2, 0, !dbg !485, !enzyme_inactive !12
%6 = bitcast [2 x i64]* %5 to i8*
call void @llvm.lifetime.start.p0i8(i64 noundef 16, i8* noundef nonnull align 8 dereferenceable(16) %6) #13
%7 = call {}*** @julia.get_pgcstack() #13
%8 = icmp sgt i64 %.fca.0.0.0.extract9, 0, !dbg !486
call void @llvm.assume(i1 noundef %8) #13, !dbg !490
%9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #13, !dbg !491, !range !117
%10 = add nuw nsw i32 %9, 1, !dbg !497
%11 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() #13, !dbg !498, !range !128
%12 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #13, !dbg !504, !range !137
%13 = add nuw nsw i32 %12, 1, !dbg !510
call fastcc void @julia_fldmod1_48281([2 x i64]* noalias nocapture nofree noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %5, i32 signext %13, i64 signext %.fca.0.0.0.extract9) #13, !dbg !511
%14 = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() #13, !dbg !512, !range !148
%15 = zext i32 %14 to i64, !dbg !518
%16 = getelementptr inbounds [2 x i64], [2 x i64]* %5, i64 0, i64 1, !dbg !525
%17 = udiv i64 %15, %.fca.0.0.0.extract9, !dbg !526
%unbox3.i = load i64, i64* %16, align 8, !dbg !527, !tbaa !170, !alias.scope !174, !noalias !177
%.not = icmp sgt i64 %unbox3.i, %.fca.0.0.0.extract9, !dbg !527
br i1 %.not, label %julia_partial_mapreduce_grid_48233_inner.exit, label %L49.i, !dbg !529
L49.i: ; preds = %entry
%18 = getelementptr inbounds [2 x i64], [2 x i64]* %5, i64 0, i64 0, !dbg !530
%19 = fadd double %0, %0, !dbg !532
%unbox11.i = load i64, i64* %18, align 8, !dbg !535, !tbaa !170, !alias.scope !174, !noalias !177
%20 = add i64 %unbox11.i, -1, !dbg !535
%21 = zext i32 %11 to i64, !dbg !537
%22 = mul i64 %20, %21, !dbg !539
%23 = zext i32 %10 to i64, !dbg !541
%24 = add i64 %22, %23, !dbg !543
%.not3647 = icmp sgt i64 %24, %.fca.0.0.0.extract13, !dbg !545
br i1 %.not3647, label %L203.i, label %L97.i.lr.ph, !dbg !547
L97.i.lr.ph: ; preds = %L49.i
%25 = call i64 @llvm.smax.i64(i64 %.fca.0.0.2.0.extract, i64 noundef 0) #13, !dbg !485
%.not38 = icmp eq i64 %25, 1
%26 = bitcast i8 addrspace(1)* %.fca.0.0.0.extract to double addrspace(1)*
%27 = mul nuw nsw i64 %17, %21
%28 = mul i64 %17, %21, !dbg !548
br label %L97.i, !dbg !547
L97.i: ; preds = %L97.i, %L97.i.lr.ph
%iv = phi i64 [ %iv.next, %L97.i ], [ 0, %L97.i.lr.ph ]
%value_phi12.i49 = phi double [ %19, %L97.i.lr.ph ], [ %35, %L97.i ]
%29 = mul i64 %28, %iv, !dbg !548
%iv.next = add nuw nsw i64 %iv, 1, !dbg !548
%30 = add i64 %24, %29, !dbg !548
%31 = call i64 @llvm.smax.i64(i64 %30, i64 %unbox3.i) #13, !dbg !548
%.op = add i64 %31, -1, !dbg !553
%32 = select i1 %.not38, i64 0, i64 %.op, !dbg !553
%33 = getelementptr inbounds double, double addrspace(1)* %26, i64 %32, !dbg !568
%34 = load double, double addrspace(1)* %33, align 8, !dbg !568, !tbaa !248
%35 = fadd double %value_phi12.i49, %34, !dbg !569
%36 = add i64 %30, %27, !dbg !571
%.not36 = icmp sgt i64 %36, %.fca.0.0.0.extract13, !dbg !545
br i1 %.not36, label %L203.i.loopexit, label %L97.i, !dbg !547
L203.i.loopexit: ; preds = %L97.i
br label %L203.i, !dbg !573
L203.i: ; preds = %L203.i.loopexit, %L49.i
%value_phi12.i.lcssa = phi double [ %19, %L49.i ], [ %35, %L203.i.loopexit ]
%37 = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #13, !dbg !573
%38 = icmp eq i32 %37, 32, !dbg !576
call void @llvm.assume(i1 noundef %38) #13, !dbg !578
%39 = call fastcc [2 x i32] @julia_fldmod1_48268(i32 signext %10) #14, !dbg !579
%.fca.0.extract14 = extractvalue [2 x i32] %39, 0, !dbg !579
%bitcast_coercion.i = bitcast double %value_phi12.i.lcssa to i64, !dbg !580
%40 = lshr i64 %bitcast_coercion.i, 32, !dbg !586
%41 = trunc i64 %40 to i32, !dbg !589
%42 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %41, i32 noundef 1, i32 noundef 31) #13, !dbg !590
%43 = zext i32 %42 to i64, !dbg !592
%44 = shl nuw i64 %43, 32, !dbg !594
%45 = trunc i64 %bitcast_coercion.i to i32, !dbg !589
%46 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %45, i32 noundef 1, i32 noundef 31) #13, !dbg !590
%47 = zext i32 %46 to i64, !dbg !596
%48 = or i64 %44, %47, !dbg !601
%bitcast_coercion35.i = bitcast i64 %48 to double, !dbg !580
%49 = fadd double %value_phi12.i.lcssa, %bitcast_coercion35.i, !dbg !603
%bitcast_coercion.i.1 = bitcast double %49 to i64, !dbg !580
%50 = lshr i64 %bitcast_coercion.i.1, 32, !dbg !586
%51 = trunc i64 %50 to i32, !dbg !589
%52 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %51, i32 noundef 2, i32 noundef 31) #13, !dbg !590
%53 = zext i32 %52 to i64, !dbg !592
%54 = shl nuw i64 %53, 32, !dbg !594
%55 = trunc i64 %bitcast_coercion.i.1 to i32, !dbg !589
%56 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %55, i32 noundef 2, i32 noundef 31) #13, !dbg !590
%57 = zext i32 %56 to i64, !dbg !596
%58 = or i64 %54, %57, !dbg !601
%bitcast_coercion35.i.1 = bitcast i64 %58 to double, !dbg !580
%59 = fadd double %49, %bitcast_coercion35.i.1, !dbg !603
%bitcast_coercion.i.2 = bitcast double %59 to i64, !dbg !580
%60 = lshr i64 %bitcast_coercion.i.2, 32, !dbg !586
%61 = trunc i64 %60 to i32, !dbg !589
%62 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %61, i32 noundef 4, i32 noundef 31) #13, !dbg !590
%63 = zext i32 %62 to i64, !dbg !592
%64 = shl nuw i64 %63, 32, !dbg !594
%65 = trunc i64 %bitcast_coercion.i.2 to i32, !dbg !589
%66 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %65, i32 noundef 4, i32 noundef 31) #13, !dbg !590
%67 = zext i32 %66 to i64, !dbg !596
%68 = or i64 %64, %67, !dbg !601
%bitcast_coercion35.i.2 = bitcast i64 %68 to double, !dbg !580
%69 = fadd double %59, %bitcast_coercion35.i.2, !dbg !603
%bitcast_coercion.i.3 = bitcast double %69 to i64, !dbg !580
%70 = lshr i64 %bitcast_coercion.i.3, 32, !dbg !586
%71 = trunc i64 %70 to i32, !dbg !589
%72 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %71, i32 noundef 8, i32 noundef 31) #13, !dbg !590
%73 = zext i32 %72 to i64, !dbg !592
%74 = shl nuw i64 %73, 32, !dbg !594
%75 = trunc i64 %bitcast_coercion.i.3 to i32, !dbg !589
%76 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %75, i32 noundef 8, i32 noundef 31) #13, !dbg !590
%77 = zext i32 %76 to i64, !dbg !596
%78 = or i64 %74, %77, !dbg !601
%bitcast_coercion35.i.3 = bitcast i64 %78 to double, !dbg !580
%79 = fadd double %69, %bitcast_coercion35.i.3, !dbg !603
%bitcast_coercion.i.4 = bitcast double %79 to i64, !dbg !580
%80 = lshr i64 %bitcast_coercion.i.4, 32, !dbg !586
%81 = trunc i64 %80 to i32, !dbg !589
%82 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %81, i32 noundef 16, i32 noundef 31) #13, !dbg !590
%83 = trunc i64 %bitcast_coercion.i.4 to i32, !dbg !589
%84 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %83, i32 noundef 16, i32 noundef 31) #13, !dbg !590
%.fca.1.extract15 = extractvalue [2 x i32] %39, 1, !dbg !579
%.not39 = icmp eq i32 %.fca.1.extract15, 1, !dbg !605
br i1 %.not39, label %L254.i, label %L273.i, !dbg !608
L254.i: ; preds = %L203.i
%85 = zext i32 %82 to i64, !dbg !592
%86 = shl nuw i64 %85, 32, !dbg !594
%87 = zext i32 %84 to i64, !dbg !596
%88 = or i64 %86, %87, !dbg !601
%bitcast_coercion35.i.4 = bitcast i64 %88 to double, !dbg !580
%89 = fadd double %79, %bitcast_coercion35.i.4, !dbg !603
%90 = add i32 %.fca.0.extract14, -1, !dbg !609
%91 = sext i32 %90 to i64, !dbg !618
%92 = getelementptr inbounds double, double addrspace(3)* bitcast ([256 x i8] addrspace(3)* @shmem to double addrspace(3)*), i64 %91, !dbg !618
store double %89, double addrspace(3)* %92, align 8, !dbg !618, !tbaa !320, !noalias !619
br label %L273.i, !dbg !622
L273.i: ; preds = %L254.i, %L203.i
call void @llvm.nvvm.barrier0() #13, !dbg !623
%93 = lshr i32 %11, 5, !dbg !625
%94 = and i32 %11, 992, !dbg !628
%95 = icmp ne i32 %94, %11, !dbg !630
%96 = zext i1 %95 to i32, !dbg !632
%97 = add nuw nsw i32 %93, %96, !dbg !637
%.not40.not = icmp ult i32 %9, %97, !dbg !639
br i1 %.not40.not, label %L302.i, label %L322.i, !dbg !640
L302.i: ; preds = %L273.i
%98 = add i32 %.fca.1.extract15, -1, !dbg !641
%99 = sext i32 %98 to i64, !dbg !650
%100 = getelementptr inbounds double, double addrspace(3)* bitcast ([256 x i8] addrspace(3)* @shmem to double addrspace(3)*), i64 %99, !dbg !650
%101 = load double, double addrspace(3)* %100, align 8, !dbg !650, !tbaa !320
br label %L322.i, !dbg !650
L322.i: ; preds = %L302.i, %L273.i
%value_phi46.i = phi double [ %101, %L302.i ], [ %0, %L273.i ]
%.not41 = icmp eq i32 %.fca.0.extract14, 1, !dbg !651
br i1 %.not41, label %L331.i.preheader, label %L357.i, !dbg !654
L331.i.preheader: ; preds = %L322.i
%bitcast_coercion50.i = bitcast double %value_phi46.i to i64, !dbg !655
%102 = lshr i64 %bitcast_coercion50.i, 32, !dbg !661
%103 = trunc i64 %102 to i32, !dbg !664
%104 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %103, i32 noundef 1, i32 noundef 31) #13, !dbg !665
%105 = zext i32 %104 to i64, !dbg !667
%106 = shl nuw i64 %105, 32, !dbg !669
%107 = trunc i64 %bitcast_coercion50.i to i32, !dbg !664
%108 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %107, i32 noundef 1, i32 noundef 31) #13, !dbg !665
%109 = zext i32 %108 to i64, !dbg !671
%110 = or i64 %106, %109, !dbg !676
%bitcast_coercion51.i = bitcast i64 %110 to double, !dbg !655
%111 = fadd double %value_phi46.i, %bitcast_coercion51.i, !dbg !678
%bitcast_coercion50.i.1 = bitcast double %111 to i64, !dbg !655
%112 = lshr i64 %bitcast_coercion50.i.1, 32, !dbg !661
%113 = trunc i64 %112 to i32, !dbg !664
%114 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %113, i32 noundef 2, i32 noundef 31) #13, !dbg !665
%115 = zext i32 %114 to i64, !dbg !667
%116 = shl nuw i64 %115, 32, !dbg !669
%117 = trunc i64 %bitcast_coercion50.i.1 to i32, !dbg !664
%118 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %117, i32 noundef 2, i32 noundef 31) #13, !dbg !665
%119 = zext i32 %118 to i64, !dbg !671
%120 = or i64 %116, %119, !dbg !676
%bitcast_coercion51.i.1 = bitcast i64 %120 to double, !dbg !655
%121 = fadd double %111, %bitcast_coercion51.i.1, !dbg !678
%bitcast_coercion50.i.2 = bitcast double %121 to i64, !dbg !655
%122 = lshr i64 %bitcast_coercion50.i.2, 32, !dbg !661
%123 = trunc i64 %122 to i32, !dbg !664
%124 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %123, i32 noundef 4, i32 noundef 31) #13, !dbg !665
%125 = zext i32 %124 to i64, !dbg !667
%126 = shl nuw i64 %125, 32, !dbg !669
%127 = trunc i64 %bitcast_coercion50.i.2 to i32, !dbg !664
%128 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %127, i32 noundef 4, i32 noundef 31) #13, !dbg !665
%129 = zext i32 %128 to i64, !dbg !671
%130 = or i64 %126, %129, !dbg !676
%bitcast_coercion51.i.2 = bitcast i64 %130 to double, !dbg !655
%131 = fadd double %121, %bitcast_coercion51.i.2, !dbg !678
%bitcast_coercion50.i.3 = bitcast double %131 to i64, !dbg !655
%132 = lshr i64 %bitcast_coercion50.i.3, 32, !dbg !661
%133 = trunc i64 %132 to i32, !dbg !664
%134 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %133, i32 noundef 8, i32 noundef 31) #13, !dbg !665
%135 = zext i32 %134 to i64, !dbg !667
%136 = shl nuw i64 %135, 32, !dbg !669
%137 = trunc i64 %bitcast_coercion50.i.3 to i32, !dbg !664
%138 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %137, i32 noundef 8, i32 noundef 31) #13, !dbg !665
%139 = zext i32 %138 to i64, !dbg !671
%140 = or i64 %136, %139, !dbg !676
%bitcast_coercion51.i.3 = bitcast i64 %140 to double, !dbg !655
%141 = fadd double %131, %bitcast_coercion51.i.3, !dbg !678
%bitcast_coercion50.i.4 = bitcast double %141 to i64, !dbg !655
%142 = lshr i64 %bitcast_coercion50.i.4, 32, !dbg !661
%143 = trunc i64 %142 to i32, !dbg !664
%144 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %143, i32 noundef 16, i32 noundef 31) #13, !dbg !665
%145 = zext i32 %144 to i64, !dbg !667
%146 = shl nuw i64 %145, 32, !dbg !669
%147 = trunc i64 %bitcast_coercion50.i.4 to i32, !dbg !664
%148 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %147, i32 noundef 16, i32 noundef 31) #13, !dbg !665
%149 = zext i32 %148 to i64, !dbg !671
%150 = or i64 %146, %149, !dbg !676
%bitcast_coercion51.i.4 = bitcast i64 %150 to double, !dbg !655
%151 = fadd double %141, %bitcast_coercion51.i.4, !dbg !678
br label %L357.i, !dbg !680
L357.i: ; preds = %L331.i.preheader, %L322.i
%value_phi52.i = phi double [ %value_phi46.i, %L322.i ], [ %151, %L331.i.preheader ]
%.not42 = icmp eq i32 %9, 0, !dbg !680
br i1 %.not42, label %L362.i, label %julia_partial_mapreduce_grid_48233_inner.exit, !dbg !683
L362.i: ; preds = %L357.i
%152 = call i64 @llvm.smax.i64(i64 %.fca.2.0.extract, i64 noundef 0) #13, !dbg !684
%153 = mul i64 %20, %152, !dbg !695
%154 = add i64 %unbox3.i, -1, !dbg !699
%155 = add i64 %154, %153, !dbg !700
%156 = bitcast i8 addrspace(1)* %.fca.0.extract to double addrspace(1)*, !dbg !708
%157 = getelementptr inbounds double, double addrspace(1)* %156, i64 %155, !dbg !708
store double %value_phi52.i, double addrspace(1)* %157, align 8, !dbg !708, !tbaa !248, !noalias !619
br label %julia_partial_mapreduce_grid_48233_inner.exit, !dbg !709
julia_partial_mapreduce_grid_48233_inner.exit: ; preds = %L362.i, %L357.i, %entry
call void @llvm.lifetime.end.p0i8(i64 noundef 16, i8* noundef nonnull %6) #13, !dbg !710
ret void, !dbg !485
}
constantarg[double %0] = 0 type: {[-1]:Float@double} - vals: {}
constantarg[[1 x [1 x [1 x i64]]] %1] = 1 type: {[-1]:Integer} - vals: {}
constantarg[[1 x [1 x [1 x i64]]] %2] = 1 type: {[-1]:Integer} - vals: {}
constantarg[{ i8 addrspace(1)*, i64, [2 x i64], i64 } %3] = 0 type: {[0]:Pointer, [0,-1]:Float@double, [8]:Integer, [9]:Integer, [10]:Integer, [11]:Integer, [12]:Integer, [13]:Integer, [14]:Integer, [15]:Integer, [16]:Integer, [17]:Integer, [18]:Integer, [19]:Integer, [20]:Integer, [21]:Integer, [22]:Integer, [23]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer} - vals: {}
constantarg[{ [1 x { i8 addrspace(1)*, i64, [1 x i64], i64 }], [1 x [1 x i64]] } %4] = 0 type: {[0]:Pointer, [0,-1]:Float@double, [8]:Integer, [9]:Integer, [10]:Integer, [11]:Integer, [12]:Integer, [13]:Integer, [14]:Integer, [15]:Integer, [16]:Integer, [17]:Integer, [18]:Integer, [19]:Integer, [20]:Integer, [21]:Integer, [22]:Integer, [23]:Integer, [24]:Integer, [25]:Integer, [26]:Integer, [27]:Integer, [28]:Integer, [29]:Integer, [30]:Integer, [31]:Integer, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer} - vals: {}
constantinst[ %5 = alloca [2 x i64], align 8] = 1 val:1 type: {[-1]:Pointer, [-1,-1]:Integer}
constantinst[ %.fca.0.0.0.extract13 = extractvalue [1 x [1 x [1 x i64]]] %1, 0, 0, 0, !dbg !13, !enzyme_inactive !12] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.fca.0.0.0.extract9 = extractvalue [1 x [1 x [1 x i64]]] %2, 0, 0, 0, !dbg !13, !enzyme_inactive !12] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.fca.0.extract = extractvalue { i8 addrspace(1)*, i64, [2 x i64], i64 } %3, 0, !dbg !13] = 1 val:0 type: {[-1]:Pointer, [-1,-1]:Float@double}
constantinst[ %.fca.2.0.extract = extractvalue { i8 addrspace(1)*, i64, [2 x i64], i64 } %3, 2, 0, !dbg !13, !enzyme_inactive !12] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.fca.0.0.0.extract = extractvalue { [1 x { i8 addrspace(1)*, i64, [1 x i64], i64 }], [1 x [1 x i64]] } %4, 0, 0, 0, !dbg !13] = 1 val:0 type: {[-1]:Pointer, [-1,-1]:Float@double}
constantinst[ %.fca.0.0.2.0.extract = extractvalue { [1 x { i8 addrspace(1)*, i64, [1 x i64], i64 }], [1 x [1 x i64]] } %4, 0, 0, 2, 0, !dbg !13, !enzyme_inactive !12] = 1 val:1 type: {[-1]:Integer}
constantinst[ %6 = bitcast [2 x i64]* %5 to i8*] = 1 val:1 type: {[-1]:Pointer, [-1,-1]:Integer}
constantinst[ call void @llvm.lifetime.start.p0i8(i64 noundef 16, i8* noundef nonnull align 8 dereferenceable(16) %6) #13] = 1 val:1 type: {}
constantinst[ %7 = call {}*** @julia.get_pgcstack() #13] = 1 val:1 type: {}
constantinst[ %8 = icmp sgt i64 %.fca.0.0.0.extract9, 0, !dbg !14] = 1 val:1 type: {[-1]:Integer}
constantinst[ call void @llvm.assume(i1 noundef %8) #13, !dbg !22] = 1 val:1 type: {}
constantinst[ %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #13, !dbg !25, !range !38] = 1 val:1 type: {[-1]:Integer}
constantinst[ %10 = add nuw nsw i32 %9, 1, !dbg !39] = 1 val:1 type: {[-1]:Integer}
constantinst[ %11 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x() #13, !dbg !41, !range !49] = 1 val:1 type: {[-1]:Integer}
constantinst[ %12 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #13, !dbg !50, !range !58] = 1 val:1 type: {[-1]:Integer}
constantinst[ %13 = add nuw nsw i32 %12, 1, !dbg !59] = 1 val:1 type: {[-1]:Integer}
constantinst[ call fastcc void @julia_fldmod1_48281([2 x i64]* noalias nocapture nofree noundef nonnull writeonly sret([2 x i64]) align 8 dereferenceable(16) %5, i32 signext %13, i64 signext %.fca.0.0.0.extract9) #13, !dbg !60] = 1 val:1 type: {}
constantinst[ %14 = call i32 @llvm.nvvm.read.ptx.sreg.nctaid.x() #13, !dbg !61, !range !69] = 1 val:1 type: {[-1]:Integer}
constantinst[ %15 = zext i32 %14 to i64, !dbg !70] = 1 val:1 type: {[-1]:Integer}
constantinst[ %16 = getelementptr inbounds [2 x i64], [2 x i64]* %5, i64 0, i64 1, !dbg !87] = 1 val:1 type: {[-1]:Pointer, [-1,-1]:Integer}
constantinst[ %17 = udiv i64 %15, %.fca.0.0.0.extract9, !dbg !90] = 1 val:1 type: {[-1]:Integer}
constantinst[ %unbox3.i = load i64, i64* %16, align 8, !dbg !92, !tbaa !95, !alias.scope !99, !noalias !102] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.not = icmp sgt i64 %unbox3.i, %.fca.0.0.0.extract9, !dbg !92] = 1 val:1 type: {[-1]:Integer}
constantinst[ br i1 %.not, label %julia_partial_mapreduce_grid_48233_inner.exit, label %L49.i, !dbg !107] = 1 val:1 type: {}
constantinst[ %18 = getelementptr inbounds [2 x i64], [2 x i64]* %5, i64 0, i64 0, !dbg !108] = 1 val:1 type: {[-1]:Pointer, [-1,-1]:Integer}
constantinst[ %19 = fadd double %0, %0, !dbg !110] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %unbox11.i = load i64, i64* %18, align 8, !dbg !117, !tbaa !95, !alias.scope !99, !noalias !102] = 1 val:1 type: {[-1]:Integer}
constantinst[ %20 = add i64 %unbox11.i, -1, !dbg !117] = 1 val:1 type: {[-1]:Integer}
constantinst[ %21 = zext i32 %11 to i64, !dbg !120] = 1 val:1 type: {[-1]:Integer}
constantinst[ %22 = mul i64 %20, %21, !dbg !124] = 1 val:1 type: {[-1]:Integer}
constantinst[ %23 = zext i32 %10 to i64, !dbg !126] = 1 val:1 type: {[-1]:Integer}
constantinst[ %24 = add i64 %22, %23, !dbg !128] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.not3647 = icmp sgt i64 %24, %.fca.0.0.0.extract13, !dbg !130] = 1 val:1 type: {[-1]:Integer}
constantinst[ br i1 %.not3647, label %L203.i, label %L97.i.lr.ph, !dbg !132] = 1 val:1 type: {}
constantinst[ %25 = call i64 @llvm.smax.i64(i64 %.fca.0.0.2.0.extract, i64 noundef 0) #13, !dbg !13] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.not38 = icmp eq i64 %25, 1] = 1 val:1 type: {[-1]:Integer}
constantinst[ %26 = bitcast i8 addrspace(1)* %.fca.0.0.0.extract to double addrspace(1)*] = 1 val:0 type: {[-1]:Pointer, [-1,-1]:Float@double}
constantinst[ %27 = mul nuw nsw i64 %17, %21] = 1 val:1 type: {[-1]:Integer}
constantinst[ %28 = mul i64 %17, %21, !dbg !133] = 1 val:1 type: {[-1]:Integer}
constantinst[ br label %L97.i, !dbg !132] = 1 val:1 type: {}
constantinst[ %iv = phi i64 [ %iv.next, %L97.i ], [ 0, %L97.i.lr.ph ]] = 1 val:1 type: {[-1]:Integer}
constantinst[ %value_phi12.i49 = phi double [ %19, %L97.i.lr.ph ], [ %35, %L97.i ]] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %29 = mul i64 %28, %iv, !dbg !133] = 1 val:1 type: {[-1]:Integer}
constantinst[ %iv.next = add nuw nsw i64 %iv, 1, !dbg !133] = 1 val:1 type: {[-1]:Integer}
constantinst[ %30 = add i64 %24, %29, !dbg !133] = 1 val:1 type: {[-1]:Integer}
constantinst[ %31 = call i64 @llvm.smax.i64(i64 %30, i64 %unbox3.i) #13, !dbg !133] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.op = add i64 %31, -1, !dbg !144] = 1 val:1 type: {[-1]:Integer}
constantinst[ %32 = select i1 %.not38, i64 0, i64 %.op, !dbg !144] = 1 val:1 type: {[-1]:Integer}
constantinst[ %33 = getelementptr inbounds double, double addrspace(1)* %26, i64 %32, !dbg !173] = 1 val:0 type: {[-1]:Pointer, [-1,-1]:Float@double}
constantinst[ %34 = load double, double addrspace(1)* %33, align 8, !dbg !173, !tbaa !174] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %35 = fadd double %value_phi12.i49, %34, !dbg !177] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %36 = add i64 %30, %27, !dbg !179] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.not36 = icmp sgt i64 %36, %.fca.0.0.0.extract13, !dbg !130] = 1 val:1 type: {[-1]:Integer}
constantinst[ br i1 %.not36, label %L203.i.loopexit, label %L97.i, !dbg !132] = 1 val:1 type: {}
constantinst[ br label %L203.i, !dbg !181] = 1 val:1 type: {}
constantinst[ %value_phi12.i.lcssa = phi double [ %19, %L49.i ], [ %35, %L203.i.loopexit ]] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %37 = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #13, !dbg !181] = 1 val:1 type: {[-1]:Integer}
constantinst[ %38 = icmp eq i32 %37, 32, !dbg !186] = 1 val:1 type: {[-1]:Integer}
constantinst[ call void @llvm.assume(i1 noundef %38) #13, !dbg !189] = 1 val:1 type: {}
constantinst[ %39 = call fastcc [2 x i32] @julia_fldmod1_48268(i32 signext %10) #14, !dbg !190] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.fca.0.extract14 = extractvalue [2 x i32] %39, 0, !dbg !190] = 1 val:1 type: {[-1]:Integer}
constantinst[ %bitcast_coercion.i = bitcast double %value_phi12.i.lcssa to i64, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %40 = lshr i64 %bitcast_coercion.i, 32, !dbg !202] = 0 val:0 type: {}
constantinst[ %41 = trunc i64 %40 to i32, !dbg !206] = 0 val:0 type: {}
constantinst[ %42 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %41, i32 noundef 1, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %43 = zext i32 %42 to i64, !dbg !210] = 0 val:0 type: {}
constantinst[ %44 = shl nuw i64 %43, 32, !dbg !214] = 0 val:0 type: {}
constantinst[ %45 = trunc i64 %bitcast_coercion.i to i32, !dbg !206] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %46 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %45, i32 noundef 1, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %47 = zext i32 %46 to i64, !dbg !217] = 0 val:0 type: {}
constantinst[ %48 = or i64 %44, %47, !dbg !223] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion35.i = bitcast i64 %48 to double, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %49 = fadd double %value_phi12.i.lcssa, %bitcast_coercion35.i, !dbg !225] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion.i.1 = bitcast double %49 to i64, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %50 = lshr i64 %bitcast_coercion.i.1, 32, !dbg !202] = 0 val:0 type: {}
constantinst[ %51 = trunc i64 %50 to i32, !dbg !206] = 0 val:0 type: {}
constantinst[ %52 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %51, i32 noundef 2, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %53 = zext i32 %52 to i64, !dbg !210] = 0 val:0 type: {}
constantinst[ %54 = shl nuw i64 %53, 32, !dbg !214] = 0 val:0 type: {}
constantinst[ %55 = trunc i64 %bitcast_coercion.i.1 to i32, !dbg !206] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %56 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %55, i32 noundef 2, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %57 = zext i32 %56 to i64, !dbg !217] = 0 val:0 type: {}
constantinst[ %58 = or i64 %54, %57, !dbg !223] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion35.i.1 = bitcast i64 %58 to double, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %59 = fadd double %49, %bitcast_coercion35.i.1, !dbg !225] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion.i.2 = bitcast double %59 to i64, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %60 = lshr i64 %bitcast_coercion.i.2, 32, !dbg !202] = 0 val:0 type: {}
constantinst[ %61 = trunc i64 %60 to i32, !dbg !206] = 0 val:0 type: {}
constantinst[ %62 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %61, i32 noundef 4, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %63 = zext i32 %62 to i64, !dbg !210] = 0 val:0 type: {}
constantinst[ %64 = shl nuw i64 %63, 32, !dbg !214] = 0 val:0 type: {}
constantinst[ %65 = trunc i64 %bitcast_coercion.i.2 to i32, !dbg !206] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %66 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %65, i32 noundef 4, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %67 = zext i32 %66 to i64, !dbg !217] = 0 val:0 type: {}
constantinst[ %68 = or i64 %64, %67, !dbg !223] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion35.i.2 = bitcast i64 %68 to double, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %69 = fadd double %59, %bitcast_coercion35.i.2, !dbg !225] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion.i.3 = bitcast double %69 to i64, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %70 = lshr i64 %bitcast_coercion.i.3, 32, !dbg !202] = 0 val:0 type: {}
constantinst[ %71 = trunc i64 %70 to i32, !dbg !206] = 0 val:0 type: {}
constantinst[ %72 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %71, i32 noundef 8, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %73 = zext i32 %72 to i64, !dbg !210] = 0 val:0 type: {}
constantinst[ %74 = shl nuw i64 %73, 32, !dbg !214] = 0 val:0 type: {}
constantinst[ %75 = trunc i64 %bitcast_coercion.i.3 to i32, !dbg !206] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %76 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %75, i32 noundef 8, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %77 = zext i32 %76 to i64, !dbg !217] = 0 val:0 type: {}
constantinst[ %78 = or i64 %74, %77, !dbg !223] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion35.i.3 = bitcast i64 %78 to double, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %79 = fadd double %69, %bitcast_coercion35.i.3, !dbg !225] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion.i.4 = bitcast double %79 to i64, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %80 = lshr i64 %bitcast_coercion.i.4, 32, !dbg !202] = 0 val:0 type: {}
constantinst[ %81 = trunc i64 %80 to i32, !dbg !206] = 0 val:0 type: {}
constantinst[ %82 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %81, i32 noundef 16, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %83 = trunc i64 %bitcast_coercion.i.4 to i32, !dbg !206] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %84 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %83, i32 noundef 16, i32 noundef 31) #13, !dbg !207] = 0 val:0 type: {}
constantinst[ %.fca.1.extract15 = extractvalue [2 x i32] %39, 1, !dbg !190] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.not39 = icmp eq i32 %.fca.1.extract15, 1, !dbg !227] = 1 val:1 type: {[-1]:Integer}
constantinst[ br i1 %.not39, label %L254.i, label %L273.i, !dbg !230] = 1 val:1 type: {}
constantinst[ %85 = zext i32 %82 to i64, !dbg !210] = 0 val:0 type: {}
constantinst[ %86 = shl nuw i64 %85, 32, !dbg !214] = 0 val:0 type: {}
constantinst[ %87 = zext i32 %84 to i64, !dbg !217] = 0 val:0 type: {}
constantinst[ %88 = or i64 %86, %87, !dbg !223] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion35.i.4 = bitcast i64 %88 to double, !dbg !191] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %89 = fadd double %79, %bitcast_coercion35.i.4, !dbg !225] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %90 = add i32 %.fca.0.extract14, -1, !dbg !231] = 1 val:1 type: {[-1]:Integer}
constantinst[ %91 = sext i32 %90 to i64, !dbg !245] = 1 val:1 type: {[-1]:Integer}
constantinst[ %92 = getelementptr inbounds double, double addrspace(3)* bitcast ([256 x i8] addrspace(3)* @shmem to double addrspace(3)*), i64 %91, !dbg !245] = 1 val:0 type: {[-1]:Pointer, [-1,0]:Float@double}
constantinst[ store double %89, double addrspace(3)* %92, align 8, !dbg !245, !tbaa !246, !noalias !248] = 0 val:1 type: {}
constantinst[ br label %L273.i, !dbg !251] = 1 val:1 type: {}
constantinst[ call void @llvm.nvvm.barrier0() #13, !dbg !252] = 1 val:1 type: {}
constantinst[ %93 = lshr i32 %11, 5, !dbg !256] = 1 val:1 type: {[-1]:Integer}
constantinst[ %94 = and i32 %11, 992, !dbg !260] = 1 val:1 type: {[-1]:Integer}
constantinst[ %95 = icmp ne i32 %94, %11, !dbg !262] = 1 val:1 type: {[-1]:Integer}
constantinst[ %96 = zext i1 %95 to i32, !dbg !265] = 1 val:1 type: {[-1]:Integer}
constantinst[ %97 = add nuw nsw i32 %93, %96, !dbg !272] = 1 val:1 type: {[-1]:Integer}
constantinst[ %.not40.not = icmp ult i32 %9, %97, !dbg !274] = 1 val:1 type: {[-1]:Integer}
constantinst[ br i1 %.not40.not, label %L302.i, label %L322.i, !dbg !275] = 1 val:1 type: {}
constantinst[ %98 = add i32 %.fca.1.extract15, -1, !dbg !276] = 1 val:1 type: {[-1]:Integer}
constantinst[ %99 = sext i32 %98 to i64, !dbg !285] = 1 val:1 type: {[-1]:Integer}
constantinst[ %100 = getelementptr inbounds double, double addrspace(3)* bitcast ([256 x i8] addrspace(3)* @shmem to double addrspace(3)*), i64 %99, !dbg !285] = 1 val:0 type: {[-1]:Pointer, [-1,0]:Float@double}
constantinst[ %101 = load double, double addrspace(3)* %100, align 8, !dbg !285, !tbaa !246] = 0 val:0 type: {[-1]:Float@double}
constantinst[ br label %L322.i, !dbg !285] = 1 val:1 type: {}
constantinst[ %value_phi46.i = phi double [ %101, %L302.i ], [ %0, %L273.i ]] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %.not41 = icmp eq i32 %.fca.0.extract14, 1, !dbg !286] = 1 val:1 type: {[-1]:Integer}
constantinst[ br i1 %.not41, label %L331.i.preheader, label %L357.i, !dbg !289] = 1 val:1 type: {}
constantinst[ %bitcast_coercion50.i = bitcast double %value_phi46.i to i64, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %102 = lshr i64 %bitcast_coercion50.i, 32, !dbg !296] = 0 val:0 type: {}
constantinst[ %103 = trunc i64 %102 to i32, !dbg !299] = 0 val:0 type: {}
constantinst[ %104 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %103, i32 noundef 1, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %105 = zext i32 %104 to i64, !dbg !302] = 0 val:0 type: {}
constantinst[ %106 = shl nuw i64 %105, 32, !dbg !304] = 0 val:0 type: {}
constantinst[ %107 = trunc i64 %bitcast_coercion50.i to i32, !dbg !299] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %108 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %107, i32 noundef 1, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %109 = zext i32 %108 to i64, !dbg !306] = 0 val:0 type: {}
constantinst[ %110 = or i64 %106, %109, !dbg !311] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion51.i = bitcast i64 %110 to double, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %111 = fadd double %value_phi46.i, %bitcast_coercion51.i, !dbg !313] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion50.i.1 = bitcast double %111 to i64, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %112 = lshr i64 %bitcast_coercion50.i.1, 32, !dbg !296] = 0 val:0 type: {}
constantinst[ %113 = trunc i64 %112 to i32, !dbg !299] = 0 val:0 type: {}
constantinst[ %114 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %113, i32 noundef 2, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %115 = zext i32 %114 to i64, !dbg !302] = 0 val:0 type: {}
constantinst[ %116 = shl nuw i64 %115, 32, !dbg !304] = 0 val:0 type: {}
constantinst[ %117 = trunc i64 %bitcast_coercion50.i.1 to i32, !dbg !299] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %118 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %117, i32 noundef 2, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %119 = zext i32 %118 to i64, !dbg !306] = 0 val:0 type: {}
constantinst[ %120 = or i64 %116, %119, !dbg !311] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion51.i.1 = bitcast i64 %120 to double, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %121 = fadd double %111, %bitcast_coercion51.i.1, !dbg !313] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion50.i.2 = bitcast double %121 to i64, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %122 = lshr i64 %bitcast_coercion50.i.2, 32, !dbg !296] = 0 val:0 type: {}
constantinst[ %123 = trunc i64 %122 to i32, !dbg !299] = 0 val:0 type: {}
constantinst[ %124 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %123, i32 noundef 4, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %125 = zext i32 %124 to i64, !dbg !302] = 0 val:0 type: {}
constantinst[ %126 = shl nuw i64 %125, 32, !dbg !304] = 0 val:0 type: {}
constantinst[ %127 = trunc i64 %bitcast_coercion50.i.2 to i32, !dbg !299] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %128 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %127, i32 noundef 4, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %129 = zext i32 %128 to i64, !dbg !306] = 0 val:0 type: {}
constantinst[ %130 = or i64 %126, %129, !dbg !311] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion51.i.2 = bitcast i64 %130 to double, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %131 = fadd double %121, %bitcast_coercion51.i.2, !dbg !313] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion50.i.3 = bitcast double %131 to i64, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %132 = lshr i64 %bitcast_coercion50.i.3, 32, !dbg !296] = 0 val:0 type: {}
constantinst[ %133 = trunc i64 %132 to i32, !dbg !299] = 0 val:0 type: {}
constantinst[ %134 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %133, i32 noundef 8, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %135 = zext i32 %134 to i64, !dbg !302] = 0 val:0 type: {}
constantinst[ %136 = shl nuw i64 %135, 32, !dbg !304] = 0 val:0 type: {}
constantinst[ %137 = trunc i64 %bitcast_coercion50.i.3 to i32, !dbg !299] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %138 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %137, i32 noundef 8, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %139 = zext i32 %138 to i64, !dbg !306] = 0 val:0 type: {}
constantinst[ %140 = or i64 %136, %139, !dbg !311] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion51.i.3 = bitcast i64 %140 to double, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %141 = fadd double %131, %bitcast_coercion51.i.3, !dbg !313] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion50.i.4 = bitcast double %141 to i64, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %142 = lshr i64 %bitcast_coercion50.i.4, 32, !dbg !296] = 0 val:0 type: {}
constantinst[ %143 = trunc i64 %142 to i32, !dbg !299] = 0 val:0 type: {}
constantinst[ %144 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %143, i32 noundef 16, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %145 = zext i32 %144 to i64, !dbg !302] = 0 val:0 type: {}
constantinst[ %146 = shl nuw i64 %145, 32, !dbg !304] = 0 val:0 type: {}
constantinst[ %147 = trunc i64 %bitcast_coercion50.i.4 to i32, !dbg !299] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %148 = call i32 @llvm.nvvm.shfl.sync.down.i32(i32 noundef -1, i32 %147, i32 noundef 16, i32 noundef 31) #13, !dbg !300] = 0 val:0 type: {}
constantinst[ %149 = zext i32 %148 to i64, !dbg !306] = 0 val:0 type: {}
constantinst[ %150 = or i64 %146, %149, !dbg !311] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %bitcast_coercion51.i.4 = bitcast i64 %150 to double, !dbg !290] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %151 = fadd double %141, %bitcast_coercion51.i.4, !dbg !313] = 0 val:0 type: {[-1]:Float@double}
constantinst[ br label %L357.i, !dbg !315] = 1 val:1 type: {}
constantinst[ %value_phi52.i = phi double [ %value_phi46.i, %L322.i ], [ %151, %L331.i.preheader ]] = 0 val:0 type: {[-1]:Float@double}
constantinst[ %.not42 = icmp eq i32 %9, 0, !dbg !315] = 1 val:1 type: {[-1]:Integer}
constantinst[ br i1 %.not42, label %L362.i, label %julia_partial_mapreduce_grid_48233_inner.exit, !dbg !318] = 1 val:1 type: {}
constantinst[ %152 = call i64 @llvm.smax.i64(i64 %.fca.2.0.extract, i64 noundef 0) #13, !dbg !319] = 1 val:1 type: {[-1]:Integer}
constantinst[ %153 = mul i64 %20, %152, !dbg !337] = 1 val:1 type: {[-1]:Integer}
constantinst[ %154 = add i64 %unbox3.i, -1, !dbg !342] = 1 val:1 type: {[-1]:Integer}
constantinst[ %155 = add i64 %154, %153, !dbg !343] = 1 val:1 type: {[-1]:Integer}
constantinst[ %156 = bitcast i8 addrspace(1)* %.fca.0.extract to double addrspace(1)*, !dbg !351] = 1 val:0 type: {[-1]:Pointer, [-1,-1]:Float@double}
constantinst[ %157 = getelementptr inbounds double, double addrspace(1)* %156, i64 %155, !dbg !351] = 1 val:0 type: {[-1]:Pointer, [-1,-1]:Float@double}
constantinst[ store double %value_phi52.i, double addrspace(1)* %157, align 8, !dbg !351, !tbaa !174, !noalias !248] = 0 val:1 type: {}
constantinst[ br label %julia_partial_mapreduce_grid_48233_inner.exit, !dbg !352] = 1 val:1 type: {}
constantinst[ call void @llvm.lifetime.end.p0i8(i64 noundef 16, i8* noundef nonnull %6) #13, !dbg !353] = 1 val:1 type: {}
constantinst[ ret void, !dbg !13] = 1 val:1 type: {}
cannot handle unknown binary operator: %40 = lshr i64 %bitcast_coercion.i, 32, !dbg !202
Stacktrace:
[1] >>>
@ ./int.jl:530
[2] >>>
@ ./int.jl:538
[3] shfl_recurse
@ ~/.julia/dev/CUDA/src/device/intrinsics/warp.jl:86
[4] shfl_recurse
@ ~/.julia/dev/CUDA/src/device/intrinsics/warp.jl:97
[5] shfl_down_sync (repeats 2 times)
@ ~/.julia/dev/CUDA/src/device/intrinsics/warp.jl:78
[6] reduce_warp
@ ~/.julia/dev/CUDA/src/mapreduce.jl:12
[7] reduce_block
@ ~/.julia/dev/CUDA/src/mapreduce.jl:28
[8] partial_mapreduce_grid
@ ~/.julia/dev/CUDA/src/mapreduce.jl:126
[9] partial_mapreduce_grid
@ ~/.julia/dev/CUDA/src/mapreduce.jl:0
was thrown during kernel execution on thread (1, 1, 1) in block (1, 1, 1).
Stacktrace not available, run Julia on debug level 2 for more details (by passing -g2 to the executable).
ERROR: Enzyme execution failed.
Enzyme: Not yet implemented forward for jl_eqtable_get
Stacktrace:
[1] get
@ ./iddict.jl:102
[2] macro expansion
@ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:208
[3] #_mapreduce#42
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/mapreduce.jl:71
Stacktrace:
[1] throwerr(cstr::Cstring)
@ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:1696
[2] get
@ ./iddict.jl:102 [inlined]
[3] macro expansion
@ ~/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:208 [inlined]
[4] #_mapreduce#42
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/mapreduce.jl:71
[5] _mapreduce
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/mapreduce.jl:33 [inlined]
[6] mapreduce
@ ~/.julia/packages/GPUArrays/bbZD0/src/host/mapreduce.jl:28 [inlined]
[7] _sum
@ ./reducedim.jl:1015 [inlined]
[8] _sum
@ ./reducedim.jl:1014 [inlined]
[9] sum
@ ./reducedim.jl:1010 [inlined]
[10] f
@ ./REPL[2]:1 [inlined]
[11] fwddiffejulia_f_2022wrap
@ ./REPL[2]:0
[12] macro expansion
@ ~/.julia/dev/Enzyme/src/compiler.jl:6673 [inlined]
[13] enzyme_call
@ ~/.julia/dev/Enzyme/src/compiler.jl:6273 [inlined]
[14] ForwardModeThunk
@ ~/.julia/dev/Enzyme/src/compiler.jl:6153 [inlined]
[15] autodiff
@ ~/.julia/dev/Enzyme/src/Enzyme.jl:427 [inlined]
[16] autodiff(::ForwardMode{FFIABI}, ::typeof(f), ::Type{Duplicated}, ::Duplicated{CuArray{…}}, ::Const{CuArray{…}})
@ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:326
[17] top-level scope
@ REPL[7]:1
Some type information was truncated. Use `show(err)` to see complete types.
Reverse mode will be added in https://github.com/JuliaGPU/CUDA.jl/pull/2422 as I understand it.
Ah cool to see some progress!
That PR should work from my local tests to enable broadcast cuda kerbel call etc, but needs some brief cuda.jl CI love .
Note that while the PR adds support for general heterogeneous code (either device or host code was separately previously supported), it still needs optimization after landing for full performance.
On Thu, Jul 25, 2024 at 10:36 AM Felix Wechsler @.***> wrote:
Ah cool to see some progress!
— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/Enzyme.jl/issues/1454#issuecomment-2250490391, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXDDE75LWUSS36KI23TZOEEPHAVCNFSM6AAAAABH6RFYH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQGQ4TAMZZGE . You are receiving this because you commented.Message ID: <EnzymeAD/Enzyme. @.***>
@jgreener64 FYI your latest issue isn't a problem with broadcasting, but the reduction of the sum
Broadcasting reverse support is now landed in CUDA.jl. Note that like mentioned, this does not include reductions (like sum). Please opena n issue for that on cuda.jl and cc me.
Issue opened at https://github.com/JuliaGPU/CUDA.jl/issues/2455.
Opening this to track progress in taking gradients through
CuArray
broadcasting. With Enzyme main (a68bf83) and CUDA v5.3.4:For forward mode:
For reverse mode: