EnzymeAD / Enzyme.jl

Julia bindings for the Enzyme automatic differentiator
https://enzyme.mit.edu
MIT License
422 stars 58 forks source link

`CuArray` broadcasting #1454

Open jgreener64 opened 1 month ago

jgreener64 commented 1 month ago

Opening this to track progress in taking gradients through CuArray broadcasting. With Enzyme main (a68bf83) and CUDA v5.3.4:

using Enzyme, CUDA
f(x, y) = sum(x .+ y)
x = CuArray(rand(5))
y = CuArray(rand(5))
dx = CuArray([1.0, 0.0, 0.0, 0.0, 0.0])

For forward mode:

autodiff(Forward, f, Duplicated, Duplicated(x, dx), Const(y))
[3915854] signal (11.2): Segmentation fault
in expression starting at REPL[12]:1
unknown function (ip: 0x7f9c7fcaf1b0)
visitIntrinsicInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:3696
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4970
GetOrCreateShadowFunction at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:4622
invertPointerM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:5533
recursivelyHandleSubfunction at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:4914
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6492
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4970
recursivelyHandleSubfunction at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:4950
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6492
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4970
recursivelyHandleSubfunction at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:4950
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6492
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4970
recursivelyHandleSubfunction at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:4950
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6492
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4970
EnzymeCreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:591
EnzymeCreateForwardDiff at /home/jgreener/.julia/dev/Enzyme/src/api.jl:168
unknown function (ip: 0x7f9cd004b93a)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
enzyme! at /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:3261
unknown function (ip: 0x7f9cd00473e8)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
#codegen#518 at /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5142
codegen at /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:4549 [inlined]
_thunk at /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5839
_thunk at /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5839 [inlined]
cached_compilation at /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5877 [inlined]
#563 at /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5943
#JuliaContext#147 at /home/jgreener/.julia/dev/GPUCompiler/src/driver.jl:52
unknown function (ip: 0x7f9cd01e2216)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
JuliaContext at /home/jgreener/.julia/dev/GPUCompiler/src/driver.jl:42
#s2042#562 at /home/jgreener/.julia/dev/Enzyme/src/compiler.jl:5895 [inlined]
#s2042#562 at ./none:0
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
GeneratedFunctionStub at ./boot.jl:602
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_call_staged at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/method.c:540
ijl_code_for_staged at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/method.c:593
get_staged at ./compiler/utilities.jl:123
retrieve_code_info at ./compiler/utilities.jl:135 [inlined]
InferenceState at ./compiler/inferencestate.jl:430
typeinf_edge at ./compiler/typeinfer.jl:920
abstract_call_method at ./compiler/abstractinterpretation.jl:629
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:95
abstract_call_known at ./compiler/abstractinterpretation.jl:2087
abstract_call at ./compiler/abstractinterpretation.jl:2169
abstract_call at ./compiler/abstractinterpretation.jl:2162
abstract_call at ./compiler/abstractinterpretation.jl:2354
abstract_eval_call at ./compiler/abstractinterpretation.jl:2370
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2380
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2624
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:2889
typeinf_local at ./compiler/abstractinterpretation.jl:3098
typeinf_nocycle at ./compiler/abstractinterpretation.jl:3186
_typeinf at ./compiler/typeinfer.jl:247
typeinf at ./compiler/typeinfer.jl:216
typeinf_edge at ./compiler/typeinfer.jl:930
abstract_call_method at ./compiler/abstractinterpretation.jl:629
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:95
abstract_call_known at ./compiler/abstractinterpretation.jl:2087
abstract_call at ./compiler/abstractinterpretation.jl:2169
abstract_apply at ./compiler/abstractinterpretation.jl:1612
abstract_call_known at ./compiler/abstractinterpretation.jl:2004
abstract_call at ./compiler/abstractinterpretation.jl:2169
abstract_call at ./compiler/abstractinterpretation.jl:2162
abstract_call at ./compiler/abstractinterpretation.jl:2354
abstract_eval_call at ./compiler/abstractinterpretation.jl:2370
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2380
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2624
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:2913
typeinf_local at ./compiler/abstractinterpretation.jl:3098
typeinf_nocycle at ./compiler/abstractinterpretation.jl:3186
_typeinf at ./compiler/typeinfer.jl:247
typeinf at ./compiler/typeinfer.jl:216
typeinf_ext at ./compiler/typeinfer.jl:1051
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1082
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1078
jfptr_typeinf_ext_toplevel_45276.1 at /home/jgreener/soft/julia/julia-1.10.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_type_infer at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:394
jl_generate_fptr_impl at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/jitlayers.cpp:502
jl_compile_method_internal at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2480 [inlined]
jl_compile_method_internal at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2368
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2886 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:877
eval_body at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:579
eval_body at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:544
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:934
ijl_toplevel_eval_in at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
#run_repl#59 at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91745.1 at /home/jgreener/soft/julia/julia-1.10.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
#1013 at ./client.jl:432
jfptr_YY.1013_82712.1 at /home/jgreener/soft/julia/julia-1.10.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82738.1 at /home/jgreener/soft/julia/julia-1.10.2/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at julia (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 235658985 (Pool: 235384224; Big: 274761); GC: 112
Segmentation fault (core dumped)

For reverse mode:

autodiff(Reverse, f, Active, Duplicated(x, dx), Const(y))
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
┌ Warning: active variables passed by value to jl_new_task are not yet supported
└ @ Enzyme.Compiler ~/.julia/dev/GPUCompiler/src/utils.jl:59
ERROR: Enzyme compilation failed.
Current scope:
; Function Attrs: mustprogress willreturn
define internal fastcc nonnull dereferenceable(16) "enzyme_type"="{[-1]:Pointer, [-1,-1]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Integer, [-1,8,16]:Pointer}" {} addrspace(10)* @preprocess_julia___910_21700([1 x i32] addrspace(11)* nocapture noundef nonnull readonly align 4 dereferenceable(4) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer}" "enzymejl_parmtype"="140596959426144" "enzymejl_parmtype_ref"="1" %0) unnamed_addr #428 !dbg !27759 {
top:
  %1 = call {}*** @julia.get_pgcstack()
  %2 = call {}*** @julia.get_pgcstack()
  %3 = bitcast {}*** %2 to {}**
  %4 = getelementptr inbounds {}*, {}** %3, i64 -14
  %5 = getelementptr inbounds {}*, {}** %4, i64 16
  %6 = bitcast {}** %5 to i8**
  %7 = load i8*, i8** %6, align 8
  %8 = call noalias nonnull dereferenceable(16) dereferenceable_or_null(16) {} addrspace(10)* @julia.gc_alloc_obj({}** %4, i64 16, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140595685148560 to {}*) to {} addrspace(10)*)), !enzyme_fromstack !409
  call void @zeroType.264({} addrspace(10)* %8, i8 0, i64 16), !enzyme_zerostack !374
  %9 = bitcast {} addrspace(10)* %8 to [2 x {} addrspace(10)*] addrspace(10)*, !enzyme_caststack !374
  %10 = bitcast {}*** %1 to {}**
  %11 = getelementptr inbounds {}*, {}** %10, i64 -14
  %12 = getelementptr inbounds {}*, {}** %11, i64 16
  %13 = bitcast {}** %12 to i8**
  %14 = load i8*, i8** %13, align 8
  %15 = call noalias nonnull dereferenceable(8) dereferenceable_or_null(8) {} addrspace(10)* @julia.gc_alloc_obj({}** %11, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140595682882576 to {}*) to {} addrspace(10)*)), !enzyme_fromstack !409
  call void @zeroType.265({} addrspace(10)* %15, i8 0, i64 8), !enzyme_zerostack !374
  %16 = bitcast {} addrspace(10)* %15 to [1 x {} addrspace(10)*] addrspace(10)*, !enzyme_caststack !374
  %17 = call {}*** @julia.get_pgcstack() #451
  %current_task119 = getelementptr inbounds {}**, {}*** %17, i64 -14
  %current_task1 = bitcast {}*** %current_task119 to {}**
  %ptls_field20 = getelementptr inbounds {}**, {}*** %17, i64 2
  %18 = bitcast {}*** %ptls_field20 to i64***
  %ptls_load2122 = load i64**, i64*** %18, align 8, !tbaa !375
  %19 = getelementptr inbounds i64*, i64** %ptls_load2122, i64 2
  %safepoint = load i64*, i64** %19, align 8, !tbaa !379
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint) #451, !dbg !27760
  fence syncscope("singlethread") seq_cst
  %20 = getelementptr inbounds [1 x i32], [1 x i32] addrspace(11)* %0, i64 0, i64 0, !dbg !27761
  %unbox = load i32, i32 addrspace(11)* %20, align 4, !dbg !27765, !tbaa !379, !alias.scope !585, !noalias !586
  %21 = call fastcc nonnull {} addrspace(10)* @julia__ntuple_21712() #451, !dbg !27767
  %box = call noalias nonnull dereferenceable(4) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 4, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140596972381280 to {}*) to {} addrspace(10)*)) #452, !dbg !27764
  %22 = bitcast {} addrspace(10)* %box to i32 addrspace(10)*, !dbg !27764
  store i32 1, i32 addrspace(10)* %22, align 8, !dbg !27764, !tbaa !445, !alias.scope !395, !noalias !27768
  %box4 = call noalias nonnull dereferenceable(4) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 4, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140596972381008 to {}*) to {} addrspace(10)*)) #452, !dbg !27764
  %23 = bitcast {} addrspace(10)* %box4 to i32 addrspace(10)*, !dbg !27764
  store i32 0, i32 addrspace(10)* %23, align 8, !dbg !27764, !tbaa !445, !alias.scope !395, !noalias !27768
  %box6 = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140596972383920 to {}*) to {} addrspace(10)*)) #452, !dbg !27764
  %24 = bitcast {} addrspace(10)* %box6 to i8 addrspace(10)*, !dbg !27764
  %newstruct.sroa.0.0..sroa_cast = bitcast {} addrspace(10)* %box6 to i32 addrspace(10)*, !dbg !27764
  store i32 1, i32 addrspace(10)* %newstruct.sroa.0.0..sroa_cast, align 8, !dbg !27764, !tbaa !489, !alias.scope !490, !noalias !27771
  %newstruct.sroa.2.0..sroa_idx = getelementptr inbounds i8, i8 addrspace(10)* %24, i64 4, !dbg !27764
  %newstruct.sroa.2.0..sroa_cast = bitcast i8 addrspace(10)* %newstruct.sroa.2.0..sroa_idx to i32 addrspace(10)*, !dbg !27764
  store i32 %unbox, i32 addrspace(10)* %newstruct.sroa.2.0..sroa_cast, align 4, !dbg !27764, !tbaa !489, !alias.scope !490, !noalias !27771
  %25 = call noalias nonnull "enzyme_inactive" {} addrspace(10)* @ijl_box_int64(i64 noundef signext 0) #453, !dbg !27764
  %26 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* noundef nonnull @ijl_apply_generic, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140596972383392 to {}*) to {} addrspace(10)*), {} addrspace(10)* nofree nonnull %box, {} addrspace(10)* nofree nonnull %box4, {} addrspace(10)* nofree nonnull %box6, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604561424448 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %25, {} addrspace(10)* nonnull %21) #454, !dbg !27764
  %newstruct8 = call noalias nonnull dereferenceable(88) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 88, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140596972389216 to {}*) to {} addrspace(10)*)) #452, !dbg !27772
  %27 = addrspacecast {} addrspace(10)* %newstruct8 to i8 addrspace(11)*, !dbg !27772
  %28 = addrspacecast {} addrspace(10)* %26 to i8 addrspace(11)*, !dbg !27772
  call void @llvm.memcpy.p11i8.p11i8.i64(i8 addrspace(11)* noundef align 8 dereferenceable(88) %27, i8 addrspace(11)* noundef align 1 dereferenceable(88) %28, i64 noundef 88, i1 noundef false) #451, !dbg !27772, !tbaa !392, !alias.scope !395, !noalias !27768
  %newstruct10 = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140596972388992 to {}*) to {} addrspace(10)*)) #452, !dbg !27775
  %.fca.0.gep18 = getelementptr [2 x {} addrspace(10)*], [2 x {} addrspace(10)*] addrspace(10)* %9, i64 0, i64 0, !dbg !27778
  store {} addrspace(10)* %newstruct10, {} addrspace(10)* addrspace(10)* %.fca.0.gep18, align 8, !dbg !27778, !noalias !27781
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %8, {} addrspace(10)* %newstruct10), !dbg !27778
  %.fca.1.gep = getelementptr [2 x {} addrspace(10)*], [2 x {} addrspace(10)*] addrspace(10)* %9, i64 0, i64 1, !dbg !27778
  store {} addrspace(10)* %newstruct8, {} addrspace(10)* addrspace(10)* %.fca.1.gep, align 8, !dbg !27778, !noalias !27781
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %8, {} addrspace(10)* %newstruct8), !dbg !27778
  %29 = addrspacecast [2 x {} addrspace(10)*] addrspace(10)* %9 to [2 x {} addrspace(10)*] addrspace(11)*, !dbg !27778
  %30 = call fastcc i32 @julia__395_21708([2 x {} addrspace(10)*] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(16) %29) #451, !dbg !27778
  %31 = icmp eq i32 %30, 0, !dbg !27782
  br i1 %31, label %L32, label %L28, !dbg !27785

L28:                                              ; preds = %top
  call fastcc void @julia_throw_api_error_20396(i32 zeroext %30) #455, !dbg !27786
  unreachable, !dbg !27786

L32:                                              ; preds = %top
  %newstruct12 = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140596972386592 to {}*) to {} addrspace(10)*)) #452, !dbg !27787
  %.fca.0.gep = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %16, i64 0, i64 0, !dbg !27791
  store {} addrspace(10)* %newstruct12, {} addrspace(10)* addrspace(10)* %.fca.0.gep, align 8, !dbg !27791, !noalias !27781
  call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* %15, {} addrspace(10)* %newstruct12), !dbg !27791
  %32 = addrspacecast [1 x {} addrspace(10)*] addrspace(10)* %16 to [1 x {} addrspace(10)*] addrspace(11)*, !dbg !27791
  call fastcc void @julia_check_20440([1 x {} addrspace(10)*] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(8) %32) #451, !dbg !27791
  %33 = addrspacecast {} addrspace(10)* %newstruct12 to i64 addrspace(11)*, !dbg !27793
  %34 = load i64, i64 addrspace(11)* %33, align 8, !dbg !27793, !tbaa !420, !alias.scope !395, !noalias !398
  %.not = icmp eq i64 %34, 0, !dbg !27796
  br i1 %.not, label %L39, label %L41, !dbg !27795

L39:                                              ; preds = %L32
  call void @ijl_throw({} addrspace(12)* noundef addrspacecast ({}* inttoptr (i64 140604626762208 to {}*) to {} addrspace(12)*)) #455, !dbg !27795
  unreachable, !dbg !27795

L41:                                              ; preds = %L32
  %35 = call fastcc nonnull {} addrspace(10)* @julia_UniqueCuContext_20482(i64 zeroext %34) #451, !dbg !27798
  %36 = addrspacecast {} addrspace(10)* %newstruct10 to i64 addrspace(11)*, !dbg !27799
  %37 = load i64, i64 addrspace(11)* %36, align 8, !dbg !27799, !tbaa !420, !alias.scope !395, !noalias !398
  %newstruct15 = call noalias nonnull dereferenceable(16) {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140596959507904 to {}*) to {} addrspace(10)*)) #452, !dbg !27801
  %38 = addrspacecast {} addrspace(10)* %newstruct15 to {} addrspace(10)* addrspace(11)*, !dbg !27801
  %39 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %38, i64 1, !dbg !27801
  store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %39, align 8, !dbg !27801, !tbaa !420, !alias.scope !395, !noalias !27768
  %40 = addrspacecast {} addrspace(10)* %newstruct15 to i64 addrspace(11)*, !dbg !27801
  store i64 %37, i64 addrspace(11)* %40, align 8, !dbg !27801, !tbaa !420, !alias.scope !395, !noalias !27768
  %41 = addrspacecast {} addrspace(10)* %newstruct15 to i8 addrspace(11)*, !dbg !27801
  %42 = getelementptr inbounds i8, i8 addrspace(11)* %41, i64 8, !dbg !27801
  %43 = bitcast i8 addrspace(11)* %42 to {} addrspace(10)* addrspace(11)*, !dbg !27801
  store atomic {} addrspace(10)* %35, {} addrspace(10)* addrspace(11)* %43 release, align 8, !dbg !27801, !tbaa !420, !alias.scope !395, !noalias !27768
  ret {} addrspace(10)* %newstruct15, !dbg !27801
}

Illegal replace ficticious phi for:   %_replacementA14 = phi {} addrspace(10)* , !dbg !390 of   %21 = call fastcc nonnull {} addrspace(10)* @julia__ntuple_21712() #451, !dbg !406
; Function Attrs: mustprogress willreturn
define internal fastcc nonnull dereferenceable(16) "enzyme_type"="{[-1]:Pointer, [-1,-1]:Pointer, [-1,8,0]:Pointer, [-1,8,8]:Integer, [-1,8,16]:Pointer}" void @diffejulia___910_21700([1 x i32] addrspace(11)* nocapture readonly align 4 dereferenceable(4) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer}" "enzymejl_parmtype"="140596959426144" "enzymejl_parmtype_ref"="1" %0, { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, { {} addrspace(10)*, {} addrspace(10)*, i1, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, i1, i1, i1*, {} addrspace(10)*, i1 }, {} addrspace(10)*, i64 } %tapeArg) unnamed_addr #428 !dbg !32267 {
top:
  %1 = call {}*** @julia.get_pgcstack()
  %2 = call {}*** @julia.get_pgcstack()
  %_replacementA31 = phi {}**
  %_replacementA30 = phi {}**
  %_replacementA29 = phi {}**
  %_replacementA28 = phi i8**
  %_replacementA27 = phi i8*
  %_replacementA26 = phi {} addrspace(10)*
  %_replacementA25 = phi [2 x {} addrspace(10)*] addrspace(10)*
  %_replacementA24 = phi {}**
  %_replacementA23 = phi {}**
  %_replacementA22 = phi {}**
  %_replacementA21 = phi i8**
  %_replacementA20 = phi i8*
  %_replacementA19 = phi {} addrspace(10)*
  %_replacementA18 = phi [1 x {} addrspace(10)*] addrspace(10)*
  %3 = call {}*** @julia.get_pgcstack() #451
  %current_task119 = getelementptr inbounds {}**, {}*** %3, i64 -14
  %current_task1 = bitcast {}*** %current_task119 to {}**
  %ptls_field20_replacementA = phi {}***
  %_replacementA17 = phi i64***
  %ptls_load2122_replacementA = phi i64**
  %_replacementA16 = phi i64**
  %safepoint_replacementA = phi i64*
  %_replacementA15 = phi i32 addrspace(11)* , !dbg !32268
  %unbox_replacementA = phi i32 , !dbg !32272
  %_replacementA14 = phi {} addrspace(10)* , !dbg !32274
  %box_replacementA = phi {} addrspace(10)* , !dbg !32271
  %_replacementA13 = phi i32 addrspace(10)* , !dbg !32271
  %box4_replacementA = phi {} addrspace(10)* , !dbg !32271
  %_replacementA12 = phi i32 addrspace(10)* , !dbg !32271
  %box6_replacementA = phi {} addrspace(10)* , !dbg !32271
  %_replacementA11 = phi i8 addrspace(10)* , !dbg !32271
  %newstruct.sroa.0.0..sroa_cast_replacementA = phi i32 addrspace(10)* , !dbg !32271
  %newstruct.sroa.2.0..sroa_idx_replacementA = phi i8 addrspace(10)* , !dbg !32271
  %newstruct.sroa.2.0..sroa_cast_replacementA = phi i32 addrspace(10)* , !dbg !32271
  %_replacementA10 = phi {} addrspace(10)* , !dbg !32271
  %4 = extractvalue { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, { {} addrspace(10)*, {} addrspace(10)*, i1, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, i1, i1, i1*, {} addrspace(10)*, i1 }, {} addrspace(10)*, i64 } %tapeArg, 2, !dbg !32271
  %_replacementA9 = phi {} addrspace(10)* , !dbg !32271
  %_replacementA8 = phi i8 addrspace(11)* , !dbg !32275
  %_replacementA7 = phi i8 addrspace(11)* , !dbg !32275
  %newstruct10_replacementA = phi {} addrspace(10)* , !dbg !32278
  %.fca.0.gep18_replacementA = phi {} addrspace(10)* addrspace(10)* , !dbg !32281
  %.fca.1.gep_replacementA = phi {} addrspace(10)* addrspace(10)* , !dbg !32281
  %_replacementA6 = phi [2 x {} addrspace(10)*] addrspace(11)* , !dbg !32281
  %_replacementA5 = phi i32 , !dbg !32281
  %_replacementA = phi i1 , !dbg !32284
  br i1 true, label %L32, label %L28, !dbg !32287

L28:                                              ; preds = %top
  unreachable

L32:                                              ; preds = %top
  %newstruct12_replacementA = phi {} addrspace(10)* , !dbg !32288
  %.fca.0.gep_replacementA = phi {} addrspace(10)* addrspace(10)* , !dbg !32292
  %_replacementA33 = phi [1 x {} addrspace(10)*] addrspace(11)* , !dbg !32292
  %_replacementA32 = phi i64 addrspace(11)* , !dbg !32294
  %5 = extractvalue { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, { {} addrspace(10)*, {} addrspace(10)*, i1, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, i1, i1, i1*, {} addrspace(10)*, i1 }, {} addrspace(10)*, i64 } %tapeArg, 7, !dbg !32297
  %.not_replacementA = phi i1 , !dbg !32297
  br i1 false, label %L39, label %L41, !dbg !32296

L39:                                              ; preds = %L32
  unreachable

L41:                                              ; preds = %L32
  %tapeArg42 = extractvalue { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, { {} addrspace(10)*, {} addrspace(10)*, i1, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, i1, i1, i1*, {} addrspace(10)*, i1 }, {} addrspace(10)*, i64 } %tapeArg, 5, !dbg !32299
  %_replacementA43 = phi {} addrspace(10)* , !dbg !32299
  %"'ip_phi3" = extractvalue { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, { {} addrspace(10)*, {} addrspace(10)*, i1, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, i1, i1, i1*, {} addrspace(10)*, i1 }, {} addrspace(10)*, i64 } %tapeArg, 6, !dbg !32299
  %_replacementA41 = phi i64 addrspace(11)* , !dbg !32300
  %"newstruct15'mi" = extractvalue { {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, {} addrspace(10)*, { {} addrspace(10)*, {} addrspace(10)*, i1, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, i1, i1, i1*, {} addrspace(10)*, i1 }, {} addrspace(10)*, i64 } %tapeArg, 4, !dbg !32302
  %newstruct15_replacementA = phi {} addrspace(10)* , !dbg !32302
  %_replacementA39 = phi {} addrspace(10)* addrspace(11)* , !dbg !32302
  %_replacementA38 = phi {} addrspace(10)* addrspace(11)* , !dbg !32302
  %_replacementA37 = phi i64 addrspace(11)* , !dbg !32302
  %_replacementA36 = phi i8 addrspace(11)* , !dbg !32302
  %_replacementA35 = phi i8 addrspace(11)* , !dbg !32302
  %_replacementA34 = phi {} addrspace(10)* addrspace(11)* , !dbg !32302
  br label %invertL41, !dbg !32302

allocsForInversion:                               ; No predecessors!

inverttop:                                        ; preds = %invertL32
  %6 = call {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)*, {} addrspace(10)*, ...) @julia.call({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32)* @ijl_apply_generic, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140596842791248 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140596298768528 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604603211760 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140595658237672 to {}*) to {} addrspace(10)*), {} addrspace(10)* %4, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140596972383392 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604782354440 to {}*) to {} addrspace(10)*), {} addrspace(10)* %box_replacementA, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604782354440 to {}*) to {} addrspace(10)*), {} addrspace(10)* %box4_replacementA, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604782354440 to {}*) to {} addrspace(10)*), {} addrspace(10)* %box6_replacementA, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604782354440 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604561424448 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604782354440 to {}*) to {} addrspace(10)*), {} addrspace(10)* %_replacementA10, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604782354440 to {}*) to {} addrspace(10)*), {} addrspace(10)* %_replacementA14, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140604782354440 to {}*) to {} addrspace(10)*)), !dbg !32271
  fence syncscope("singlethread") seq_cst
  fence syncscope("singlethread") seq_cst
  ret void

invertL28:                                        ; No predecessors!

invertL32:                                        ; preds = %invertL41
  br label %inverttop

invertL39:                                        ; No predecessors!

invertL41:                                        ; preds = %L41
  call fastcc void @diffejulia_UniqueCuContext_20482(i64 zeroext %5, { {} addrspace(10)*, {} addrspace(10)*, i1, {} addrspace(10)* addrspace(10)*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, {} addrspace(10)* addrspace(10)*, i1*, i1, i1, i1*, {} addrspace(10)*, i1 } %tapeArg42), !dbg !32299
  br label %invertL32
}

LLVM.CallInst(%21 = call fastcc nonnull {} addrspace(10)* @julia__ntuple_21712() #451, !dbg !406)
LLVM.PHIInst(%_replacementA14 = phi {} addrspace(10)* , !dbg !390)

Stacktrace:
 [1] ntuple
   @ ./ntuple.jl:19
 [2] _
   @ ~/.julia/dev/CUDA/lib/cudadrv/pool.jl:18

Stacktrace:
  [1] julia_error(cstr::Cstring, val::Ptr{…}, errtype::Enzyme.API.ErrorType, data::Ptr{…}, data2::Ptr{…}, B::Ptr{…})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:1754
  [2] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{…}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, width::Int64, additionalArg::Ptr{…}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{…}, augmented::Ptr{…}, atomicAdd::Bool)
    @ Enzyme.API ~/.julia/dev/Enzyme/src/api.jl:154
  [3] enzyme!(job::GPUCompiler.CompilerJob{…}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::Tuple{…}, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{…}, boxedArgs::Set{…})
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:3249
  [4] codegen(output::Symbol, job::GPUCompiler.CompilerJob{…}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5142
  [5] codegen
    @ ~/.julia/dev/Enzyme/src/compiler.jl:4549 [inlined]
  [6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5839
  [7] _thunk
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5839 [inlined]
  [8] cached_compilation
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5877 [inlined]
  [9] (::Enzyme.Compiler.var"#563#564"{…})(ctx::LLVM.Context)
    @ Enzyme.Compiler ~/.julia/dev/Enzyme/src/compiler.jl:5943
 [10] JuliaContext(f::Enzyme.Compiler.var"#563#564"{…}; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:52
 [11] JuliaContext(f::Function)
    @ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:42
 [12] #s2042#562
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5895 [inlined]
 [13]
    @ Enzyme.Compiler ./none:0
 [14] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
 [15] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:286 [inlined]
 [16] autodiff(::ReverseMode{false, FFIABI, false}, ::typeof(f), ::Type{Active}, ::Duplicated{CuArray{…}}, ::Const{CuArray{…}})
    @ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:303
 [17] top-level scope
    @ REPL[11]:1
 [18] top-level scope
    @ ~/.julia/dev/CUDA/src/initialization.jl:209
Some type information was truncated. Use `show(err)` to see complete types.
wsmoses commented 1 month ago

With debug info:

Cannot create a null constant of that type!
UNREACHABLE executed at /home/wmoses/git/Enzyme.jl/julia10/deps/srccache/llvm-julia-15.0.7-10/llvm/lib/IR/Constants.cpp:374!

[3398190] signal (6.-6): Aborted
in expression starting at /home/wmoses/git/Enzyme.jl/cubc.jl:6
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
llvm_unreachable_internal at /home/wmoses/git/Enzyme.jl/julia10/deps/srccache/llvm-julia-15.0.7-10/llvm/lib/Support/ErrorHandling.cpp:212
getNullValue at /home/wmoses/git/Enzyme.jl/julia10/deps/srccache/llvm-julia-15.0.7-10/llvm/lib/IR/Constants.cpp:374
handleAdjointForIntrinsic at /home/wmoses/.julia/scratchspaces/7cc45869-7501-5eee-bdea-0790c847d4ef/src/Enzyme/enzyme/Enzyme/AdjointGenerator.h:4023
visitIntrinsicInst at /home/wmoses/.julia/scratchspaces/7cc45869-7501-5eee-bdea-0790c847d4ef/src/Enzyme/enzyme/Enzyme/AdjointGenerator.h:3696
wsmoses commented 1 month ago

@jgreener64 the forward mode assertion should no longer err that way. It for some reason has a size mismatch error come up though.

If you have cycles, some minimization through the broadcast impl would definitely be helpful here.

jgreener64 commented 1 month ago

The forward mode error is

ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 5 and 5
Stacktrace:
  [1] _bcs1
    @ ./broadcast.jl:555 [inlined]
  [2] _bcs
    @ ./broadcast.jl:549 [inlined]
  [3] broadcast_shape
    @ ./broadcast.jl:543 [inlined]
  [4] combine_axes
    @ ./broadcast.jl:524 [inlined]
  [5] instantiate
    @ ./broadcast.jl:306 [inlined]
  [6] materialize
    @ ./broadcast.jl:903 [inlined]
  [7] f
    @ ./REPL[3]:1 [inlined]
  [8] fwddiffejulia_f_4514wrap
    @ ./REPL[3]:0
  [9] macro expansion
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5916 [inlined]
 [10] enzyme_call
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5566 [inlined]
 [11] ForwardModeThunk
    @ ~/.julia/dev/Enzyme/src/compiler.jl:5446 [inlined]
 [12] autodiff
    @ ~/.julia/dev/Enzyme/src/Enzyme.jl:399 [inlined]
 [13] autodiff(::ForwardMode{FFIABI}, ::typeof(f), ::Type{Duplicated}, ::Duplicated{CuArray{…}}, ::Const{CuArray{…}})
    @ Enzyme ~/.julia/dev/Enzyme/src/Enzyme.jl:303
 [14] top-level scope
    @ REPL[9]:1
 [15] top-level scope
    @ ~/.julia/dev/CUDA/src/initialization.jl:209
Some type information was truncated. Use `show(err)` to see complete types.

which certainly seems a strange one. Adding

println("a ", typeof(a), " ", a, " b ", typeof(b), " ", b, " ", a == b)

to the start of Base.Broadcast._bcs1(a, b) (https://github.com/JuliaLang/julia/blob/0b4590a5507d3f3046e5bafc007cacbbfc9b310b/base/broadcast.jl#L555), where a and b are the dimension sizes of the broadcasted arrays, gives the following for the primal function:

f(x, y)
a Base.OneTo{Int64} Base.OneTo(5) b Base.OneTo{Int64} Base.OneTo(5) true
a Base.OneTo{Int64} Base.OneTo(1) b Base.OneTo{Int64} Base.OneTo(1) true
a Base.OneTo{Int64} Base.OneTo(1) b Base.OneTo{Int64} Base.OneTo(1) true
5.617565028176828

and this for the gradient:

autodiff(Forward, f, Duplicated, Duplicated(x, dx), Const(y))
a Base.OneTo{Int64} Base.OneTo(5) b Base.OneTo{Int64} Base.OneTo(5) true
[error as above]

The error is effectively thrown when a != b, but the failure case doesn't seem to print. I wonder if Enzyme does some conversion of the types of the dimension sizes such that a == b no longer holds.

I also tried Infiltrator.jl and Debugger.jl but didn't have much luck.

wsmoses commented 1 month ago

@jgreener64 is that still the case, I thought on main that should now be fixed

jgreener64 commented 1 month ago

Still the case for me on 21b0762d with CUDA 5.3.4 and Julia 1.10.3.