Closed ArnoStrouwen closed 7 months ago
Can you post your Julia and enzyme version? If the Bitcode flag was needed to be passed that means you were on an earlier version before this was marked non experimental and thus it may have been fixed since.
Can you also isolate this to just the Enzyme autodiff call without the wrappers
I think this is approximately what is going on inside SciMLSensitivity:
using Lux, ComponentArrays, OrdinaryDiffEq, SciMLSensitivity, Statistics, Random
using Enzyme; Enzyme.Compiler.bitcode_replacement!(false)
rng = Random.default_rng()
tspan = (0.0f0, 8.0f0)
ann = Chain(Dense(1, 32, tanh), Dense(32, 32, tanh), Dense(32, 1))
ps, st = Lux.setup(rng, ann)
p = ComponentArray(ps)
θ, ax = getdata(p), getaxes(p)
function dxdt_(dx, x, p, t)
ps = ComponentArray(p, ax)
x1, x2 = x
dx[1] = x[2]
dx[2] = first(ann([t], ps, st))[1]^3
end
x0 = [-4.0f0, 0.0f0]
ts = Float32.(collect(0.0:0.01:tspan[2]))
dx = zero(x0)
function adfunc(out, u, _p, t)
dxdt_(out, u, _p, t)
nothing
end
Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(dx, copy(x0)),
Enzyme.Duplicated(copy(x0), zero(x0)), Enzyme.Duplicated(copy(θ), zero(θ)), Enzyme.Const(ts[1]))
(Enzyme) pkg> st
Status `~/SciML/SciMLSensitivity.jl/Enzyme/Project.toml`
[b0b7db55] ComponentArrays v0.15.8
[7da242da] Enzyme v0.11.14
[b2108857] Lux v0.5.14
[1dea7af3] OrdinaryDiffEq v6.70.1
[1ed8b502] SciMLSensitivity v7.55.0
[9a3f8284] Random
[10745b16] Statistics v1.10.0
julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 on 24 virtual cores
Environment:
JULIA_PKG_DEVDIR = /home/arno/SciML/
And for sake of understanding, what is the expected result here, that is not being computed correctly?
For the complete example, in the original post of this issue, is that the gradient is different depending on the bitcode flag.
For the paired down example, I don't know what could be the issue, nothing seems immediately wrong to me in the Duplicated vectors, but I don't have much Enzyme experience.
I reduced the example, such that it still gives the output:
** On entry to SGEMV parameter number 6 had an illegal value **
.
Perhaps I paired it down too much, besides this autodiff call, there is a more complicated one also present in SciMLSensitivity, where dxdt_
gets a different wrapper,
https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/adjoint_common.jl#L430-L455
and this wrapper then gets Duplicated with make_zero
:
https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/adjoint_common.jl#L201C47-L201C100
https://github.com/SciML/SciMLSensitivity.jl/blob/master/src/derivative_wrappers.jl#L696C63-L696C77
In order to debug this properly we'll need an example:
Reduced to :
using Enzyme; Enzyme.Compiler.bitcode_replacement!(false)
using LinearAlgebra
ps = zeros(Float32, 30, 1)
function adfunc(ps)
out = Vector{Float32}(undef, 30)
@inline LinearAlgebra.BLAS.gemv!('N', true, ps, [0.0f0], false, out)
return out[1]
end
Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(deepcopy(ps), deepcopy(ps)))
julia> Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(deepcopy(ps), deepcopy(ps)))
after simplification :
; Function Attrs: mustprogress willreturn
define float @preprocess_julia_adfunc_10000({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0) local_unnamed_addr #12 !dbg !117 {
top:
%1 = alloca i8, align 1
%2 = alloca i64, align 16
%3 = bitcast i64* %2 to i8*
%4 = alloca i64, align 16
%5 = bitcast i64* %4 to i8*
%6 = alloca i32, align 8
%7 = bitcast i32* %6 to i8*
%8 = alloca i64, align 16
%9 = bitcast i64* %8 to i8*
%10 = alloca i64, align 16
%11 = bitcast i64* %10 to i8*
%12 = alloca i32, align 8
%13 = bitcast i32* %12 to i8*
%14 = alloca i64, align 16
%15 = bitcast i64* %14 to i8*
%16 = call {}*** @julia.get_pgcstack() #13
%ptls_field124 = getelementptr inbounds {}**, {}*** %16, i64 2
%17 = bitcast {}*** %ptls_field124 to i64***
%ptls_load125126 = load i64**, i64*** %17, align 8, !tbaa !9
%18 = getelementptr inbounds i64*, i64** %ptls_load125126, i64 2
%safepoint = load i64*, i64** %18, align 8, !tbaa !13, !invariant.load !8
fence syncscope("singlethread") seq_cst
call void @julia.safepoint(i64* %safepoint) #13, !dbg !118
fence syncscope("singlethread") seq_cst
%19 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 30) #14, !dbg !119
%20 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 1) #14, !dbg !121
%21 = addrspacecast {} addrspace(10)* %20 to {} addrspace(11)*, !dbg !124
%22 = bitcast {} addrspace(10)* %20 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !124
%23 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %22 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !124
%24 = bitcast {} addrspace(10)* %20 to float addrspace(13)* addrspace(10)*, !dbg !124
%25 = addrspacecast float addrspace(13)* addrspace(10)* %24 to float addrspace(13)* addrspace(11)*, !dbg !124
%arrayptr127 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %25, align 8, !dbg !124, !tbaa !28, !alias.scope !126, !noalias !36, !nonnull !8
store float 0.000000e+00, float addrspace(13)* %arrayptr127, align 4, !dbg !124, !tbaa !41, !alias.scope !44, !noalias !129
%26 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !130
%27 = bitcast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(10)*, !dbg !130
%28 = addrspacecast {} addrspace(10)* addrspace(10)* %27 to {} addrspace(10)* addrspace(11)*, !dbg !130
%arraysize_ptr = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %28, i64 3, !dbg !130
%29 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr to i64 addrspace(11)*, !dbg !130
%arraysize = load i64, i64 addrspace(11)* %29, align 8, !dbg !130, !tbaa !13, !range !51, !invariant.load !8, !alias.scope !52, !noalias !53
%arraysize_ptr2 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %28, i64 4, !dbg !130
%30 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr2 to i64 addrspace(11)*, !dbg !130
%arraysize3 = load i64, i64 addrspace(11)* %30, align 16, !dbg !130, !tbaa !13, !range !51, !invariant.load !8, !alias.scope !52, !noalias !53
%arraylen_ptr = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %23, i64 0, i32 1, !dbg !132
%arraylen = load i64, i64 addrspace(11)* %arraylen_ptr, align 8, !dbg !132, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
%31 = icmp eq i64 %arraylen, %arraysize3, !dbg !134
br i1 %31, label %L17, label %top.L19_crit_edge, !dbg !133
top.L19_crit_edge: ; preds = %top
%.phi.trans.insert = bitcast {} addrspace(10)* %19 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*
%.phi.trans.insert143 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %.phi.trans.insert to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*
%arraylen_ptr10.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %.phi.trans.insert143, i64 0, i32 1
%arraylen11.pre = load i64, i64 addrspace(11)* %arraylen_ptr10.phi.trans.insert, align 8, !dbg !136, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
br label %L19, !dbg !133
L17: ; preds = %top
%32 = bitcast {} addrspace(10)* %19 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !132
%33 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %32 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !132
%arraylen_ptr104 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %33, i64 0, i32 1, !dbg !132
%arraylen105 = load i64, i64 addrspace(11)* %arraylen_ptr104, align 8, !dbg !132, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
%34 = icmp eq i64 %arraylen105, %arraysize, !dbg !134
br i1 %34, label %L29, label %L19, !dbg !133
L19: ; preds = %L17, %top.L19_crit_edge
%arraylen11 = phi i64 [ %arraylen11.pre, %top.L19_crit_edge ], [ %arraylen105, %L17 ], !dbg !136
%current_task1123 = getelementptr inbounds {}**, {}*** %16, i64 -14
%current_task1 = bitcast {}*** %current_task1123 to {}**
%newstruct13 = call noalias nonnull dereferenceable(16) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803713277456 to {}*) to {} addrspace(10)*)) #15, !dbg !138
%35 = bitcast {} addrspace(10)* %newstruct13 to {} addrspace(10)* addrspace(10)*, !dbg !138
%36 = addrspacecast {} addrspace(10)* addrspace(10)* %35 to {} addrspace(10)* addrspace(11)*, !dbg !138
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %36, align 8, !dbg !138, !tbaa !72, !alias.scope !44, !noalias !129
%37 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %36, i64 1, !dbg !138
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %37, align 8, !dbg !138, !tbaa !72, !alias.scope !44, !noalias !129
%box = call noalias nonnull dereferenceable(56) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 56, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803908591440 to {}*) to {} addrspace(10)*)) #15, !dbg !138
%38 = bitcast {} addrspace(10)* %box to { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !138
%.repack = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 0, !dbg !138
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032336 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
%.repack128.repack = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 1, i64 0, !dbg !138
store i64 %arraysize, i64 addrspace(10)* %.repack128.repack, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
%.repack128.repack138 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 1, i64 1, !dbg !138
store i64 %arraysize3, i64 addrspace(10)* %.repack128.repack138, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
%.repack130 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 2, !dbg !138
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032304 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack130, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
%.repack132 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 3, !dbg !138
store i64 %arraylen, i64 addrspace(10)* %.repack132, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
%.repack134 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 4, !dbg !138
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032256 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack134, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
%.repack136 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %38, i64 0, i32 5, !dbg !138
store i64 %arraylen11, i64 addrspace(10)* %.repack136, align 8, !dbg !138, !tbaa !75, !alias.scope !44, !noalias !129
store atomic {} addrspace(10)* %box, {} addrspace(10)* addrspace(11)* %36 release, align 8, !dbg !138, !tbaa !72, !alias.scope !44, !noalias !129
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* nofree noundef nonnull %newstruct13, {} addrspace(10)* nofree nonnull %box) #16, !dbg !138
%39 = bitcast {} addrspace(10)* %newstruct13 to i8 addrspace(10)*, !dbg !138
%40 = addrspacecast i8 addrspace(10)* %39 to i8 addrspace(11)*, !dbg !138
%41 = getelementptr inbounds i8, i8 addrspace(11)* %40, i64 8, !dbg !138
%42 = bitcast i8 addrspace(11)* %41 to {} addrspace(10)* addrspace(11)*, !dbg !138
store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803939037192 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %42 release, align 8, !dbg !138, !tbaa !72, !alias.scope !44, !noalias !129
%box16 = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803724611632 to {}*) to {} addrspace(10)*)) #15, !dbg !137
%43 = bitcast {} addrspace(10)* %box16 to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !137
%44 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %43, i64 0, i64 0, !dbg !137
store {} addrspace(10)* %newstruct13, {} addrspace(10)* addrspace(10)* %44, align 8, !dbg !137, !tbaa !75, !alias.scope !44, !noalias !129
%45 = addrspacecast {} addrspace(10)* %box16 to {} addrspace(12)*, !dbg !137
call void @ijl_throw({} addrspace(12)* %45) #17, !dbg !137
unreachable, !dbg !137
L29: ; preds = %L17
%46 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %21) #18, !dbg !139
%47 = bitcast {}* %46 to i8**, !dbg !139
%arrayptr20 = load i8*, i8** %47, align 8, !dbg !139, !tbaa !28, !alias.scope !60, !noalias !36, !nonnull !8
%48 = ptrtoint i8* %arrayptr20 to i64, !dbg !139
%49 = addrspacecast {} addrspace(10)* %19 to {} addrspace(11)*, !dbg !143
%50 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %49) #18, !dbg !143
%51 = bitcast {}* %50 to i8**, !dbg !143
%arrayptr22 = load i8*, i8** %51, align 8, !dbg !143, !tbaa !28, !alias.scope !60, !noalias !36, !nonnull !8
%52 = ptrtoint i8* %arrayptr22 to i64, !dbg !143
%53 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %26) #18, !dbg !147
%54 = bitcast {}* %53 to i8**, !dbg !147
%arrayptr24 = load i8*, i8** %54, align 8, !dbg !147, !tbaa !13, !invariant.load !8, !alias.scope !52, !noalias !53, !nonnull !8
%55 = ptrtoint i8* %arrayptr24 to i64, !dbg !147
%.not = icmp eq i64 %arraysize, 0, !dbg !150
%56 = select i1 %.not, i64 1, i64 %arraysize, !dbg !154
%57 = call i64 @llvm.umax.i64(i64 %arraysize, i64 %56) #13, !dbg !154
%58 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, {} addrspace(10)* nonnull %20, {} addrspace(10)* nonnull %19) #13, !dbg !155
store i8 78, i8* %1, align 1, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
store i64 %arraysize, i64* %2, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
store i64 %arraysize3, i64* %4, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
%memcpy_refined_dst48 = bitcast i32* %6 to float*, !dbg !156
store float 1.000000e+00, float* %memcpy_refined_dst48, align 8, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
store i64 %57, i64* %8, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
store i64 1, i64* %10, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
%memcpy_refined_dst55 = bitcast i32* %12 to float*, !dbg !156
store float 0.000000e+00, float* %memcpy_refined_dst55, align 8, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
store i64 1, i64* %14, align 16, !dbg !156, !tbaa !72, !alias.scope !44, !noalias !129
call void @sgemv_64_(i8* noundef nonnull %1, i8* noundef nonnull %3, i8* noundef nonnull %5, i8* noundef nonnull %7, i64 %55, i8* noundef nonnull %9, i64 %48, i8* noundef nonnull %11, i8* noundef nonnull %13, i64 %52, i8* noundef nonnull %15, i64 noundef 1) #13 [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !155
call void @llvm.julia.gc_preserve_end(token %58) #13, !dbg !155
%arraylen98 = load i64, i64 addrspace(11)* %arraylen_ptr104, align 8, !dbg !159, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
%inbounds.not = icmp eq i64 %arraylen98, 0, !dbg !159
br i1 %inbounds.not, label %oob, label %idxend, !dbg !159
oob: ; preds = %L29
%errorbox = alloca i64, align 8, !dbg !159
store i64 1, i64* %errorbox, align 8, !dbg !159, !noalias !161
%59 = addrspacecast {} addrspace(10)* %19 to {} addrspace(12)*, !dbg !159
call void @llvm.lifetime.end.p0i8(i64 noundef 1, i8* noundef nonnull %1) #13
call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %3) #13
call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %5) #13
call void @llvm.lifetime.end.p0i8(i64 noundef 4, i8* noundef nonnull %7) #13
call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %9) #13
call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %11) #13
call void @llvm.lifetime.end.p0i8(i64 noundef 4, i8* noundef nonnull %13) #13
call void @llvm.lifetime.end.p0i8(i64 noundef 8, i8* noundef nonnull %15) #13
call void @ijl_bounds_error_ints({} addrspace(12)* %59, i64* noundef nonnull align 8 %errorbox, i64 noundef 1) #17, !dbg !159
unreachable, !dbg !159
idxend: ; preds = %L29
%60 = bitcast {} addrspace(10)* %19 to float addrspace(13)* addrspace(10)*, !dbg !159
%61 = addrspacecast float addrspace(13)* addrspace(10)* %60 to float addrspace(13)* addrspace(11)*, !dbg !159
%arrayptr100141 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %61, align 8, !dbg !159, !tbaa !28, !alias.scope !126, !noalias !36, !nonnull !8
%arrayref = load float, float addrspace(13)* %arrayptr100141, align 4, !dbg !159, !tbaa !41, !alias.scope !44, !noalias !116
ret float %arrayref, !dbg !160
}
; Function Attrs: mustprogress willreturn
define internal void @diffejulia_adfunc_10000({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* align 16 %"'", float %differeturn) local_unnamed_addr #12 !dbg !162 {
top:
%"arrayref'de" = alloca float, align 4
%1 = getelementptr float, float* %"arrayref'de", i64 0
store float 0.000000e+00, float* %1, align 4
%byref. = alloca i64, align 8
%ret = alloca float, align 4
%byref.int.one = alloca i64, align 8
%byref.transpose.transa = alloca i8, align 1
%byref.constant.char.N = alloca i8, align 1
%byref.constant.fp.1.0 = alloca float, align 4
%2 = alloca i8, align 1
%3 = alloca i64, align 16
%4 = bitcast i64* %3 to i8*
%5 = alloca i64, align 16
%6 = bitcast i64* %5 to i8*
%7 = alloca i32, align 8
%8 = bitcast i32* %7 to i8*
%9 = alloca i64, align 16
%10 = bitcast i64* %9 to i8*
%11 = alloca i64, align 16
%12 = bitcast i64* %11 to i8*
%13 = alloca i32, align 8
%14 = bitcast i32* %13 to i8*
%15 = alloca i64, align 16
%16 = bitcast i64* %15 to i8*
%17 = call {}*** @julia.get_pgcstack() #15
%ptls_field124 = getelementptr inbounds {}**, {}*** %17, i64 2
%18 = bitcast {}*** %ptls_field124 to i64***
%ptls_load125126 = load i64**, i64*** %18, align 8, !tbaa !9, !alias.scope !163, !noalias !166
%19 = getelementptr inbounds i64*, i64** %ptls_load125126, i64 2
%safepoint = load i64*, i64** %19, align 8, !tbaa !13, !invariant.load !8, !alias.scope !168, !noalias !171
fence syncscope("singlethread") seq_cst
call void @julia.safepoint(i64* %safepoint) #15, !dbg !173
fence syncscope("singlethread") seq_cst
%20 = call {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 30), !dbg !174
%21 = bitcast {} addrspace(10)* %20 to i8 addrspace(13)* addrspace(10)*, !dbg !174
%22 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(10)* %21, align 8, !dbg !174
call void @llvm.memset.p13i8.i64(i8 addrspace(13)* align 4 %22, i8 0, i64 120, i1 false), !dbg !174
%23 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 30) #16, !dbg !174
%24 = call {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 1), !dbg !176
%25 = bitcast {} addrspace(10)* %24 to i8 addrspace(13)* addrspace(10)*, !dbg !176
%26 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(10)* %25, align 8, !dbg !176
call void @llvm.memset.p13i8.i64(i8 addrspace(13)* align 4 %26, i8 0, i64 4, i1 false), !dbg !176
%27 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 1) #16, !dbg !176
%"'ipc15" = addrspacecast {} addrspace(10)* %24 to {} addrspace(11)*, !dbg !179
%28 = addrspacecast {} addrspace(10)* %27 to {} addrspace(11)*, !dbg !179
%29 = bitcast {} addrspace(10)* %27 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !179
%30 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %29 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !179
%"'ipc" = bitcast {} addrspace(10)* %24 to float addrspace(13)* addrspace(10)*, !dbg !179
%31 = bitcast {} addrspace(10)* %27 to float addrspace(13)* addrspace(10)*, !dbg !179
%"'ipc4" = addrspacecast float addrspace(13)* addrspace(10)* %"'ipc" to float addrspace(13)* addrspace(11)*, !dbg !179
%32 = addrspacecast float addrspace(13)* addrspace(10)* %31 to float addrspace(13)* addrspace(11)*, !dbg !179
%"arrayptr127'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc4", align 8, !dbg !179, !tbaa !28, !alias.scope !181, !noalias !184, !nonnull !8
%arrayptr127 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %32, align 8, !dbg !179, !tbaa !28, !alias.scope !186, !noalias !187, !nonnull !8
store float 0.000000e+00, float addrspace(13)* %arrayptr127, align 4, !dbg !179, !tbaa !41, !alias.scope !188, !noalias !191
%"'ipc11" = addrspacecast {} addrspace(10)* %"'" to {} addrspace(11)*, !dbg !193
%33 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !193
%34 = bitcast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(10)*, !dbg !193
%35 = addrspacecast {} addrspace(10)* addrspace(10)* %34 to {} addrspace(10)* addrspace(11)*, !dbg !193
%arraysize_ptr = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %35, i64 3, !dbg !193
%36 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr to i64 addrspace(11)*, !dbg !193
%arraysize = load i64, i64 addrspace(11)* %36, align 8, !dbg !193, !tbaa !13, !range !51, !invariant.load !8, !alias.scope !195, !noalias !198
%arraysize_ptr2 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %35, i64 4, !dbg !193
%37 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr2 to i64 addrspace(11)*, !dbg !193
%arraysize3 = load i64, i64 addrspace(11)* %37, align 16, !dbg !193, !tbaa !13, !range !51, !invariant.load !8, !alias.scope !195, !noalias !198
%arraylen_ptr = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %30, i64 0, i32 1, !dbg !200
%arraylen = load i64, i64 addrspace(11)* %arraylen_ptr, align 8, !dbg !200, !tbaa !58, !range !51, !alias.scope !202, !noalias !187
%38 = icmp eq i64 %arraylen, %arraysize3, !dbg !203
br i1 %38, label %L17, label %top.L19_crit_edge, !dbg !201
top.L19_crit_edge: ; preds = %top
%.phi.trans.insert = bitcast {} addrspace(10)* %23 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*
%.phi.trans.insert143 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %.phi.trans.insert to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*
%arraylen_ptr10.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %.phi.trans.insert143, i64 0, i32 1
%arraylen11.pre = load i64, i64 addrspace(11)* %arraylen_ptr10.phi.trans.insert, align 8, !dbg !205, !tbaa !58, !range !51, !alias.scope !60, !noalias !36
unreachable
L17: ; preds = %top
%39 = bitcast {} addrspace(10)* %23 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !200
%40 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %39 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !200
%arraylen_ptr104 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %40, i64 0, i32 1, !dbg !200
%arraylen105 = load i64, i64 addrspace(11)* %arraylen_ptr104, align 8, !dbg !200, !tbaa !58, !range !51, !alias.scope !207, !noalias !210
%41 = icmp eq i64 %arraylen105, %arraysize, !dbg !203
br i1 %41, label %L29, label %L19, !dbg !201
L19: ; preds = %L17
%arraylen11 = phi i64 [ %arraylen105, %L17 ], !dbg !205
%current_task1123 = getelementptr inbounds {}**, {}*** %17, i64 -14
%current_task1 = bitcast {}*** %current_task1123 to {}**
%newstruct13 = call noalias nonnull dereferenceable(16) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803713277456 to {}*) to {} addrspace(10)*)) #17, !dbg !212
%42 = bitcast {} addrspace(10)* %newstruct13 to {} addrspace(10)* addrspace(10)*, !dbg !212
%43 = addrspacecast {} addrspace(10)* addrspace(10)* %42 to {} addrspace(10)* addrspace(11)*, !dbg !212
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %43, align 8, !dbg !212, !tbaa !72, !alias.scope !44, !noalias !213
%44 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %43, i64 1, !dbg !212
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %44, align 8, !dbg !212, !tbaa !72, !alias.scope !44, !noalias !213
%box = call noalias nonnull dereferenceable(56) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 56, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803908591440 to {}*) to {} addrspace(10)*)) #17, !dbg !212
%45 = bitcast {} addrspace(10)* %box to { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !212
%.repack = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 0, !dbg !212
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032336 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
%.repack128.repack = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 1, i64 0, !dbg !212
store i64 %arraysize, i64 addrspace(10)* %.repack128.repack, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
%.repack128.repack138 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 1, i64 1, !dbg !212
store i64 %arraysize3, i64 addrspace(10)* %.repack128.repack138, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
%.repack130 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 2, !dbg !212
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032304 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack130, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
%.repack132 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 3, !dbg !212
store i64 %arraylen, i64 addrspace(10)* %.repack132, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
%.repack134 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 4, !dbg !212
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803848032256 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack134, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
%.repack136 = getelementptr inbounds { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, [2 x i64], {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %45, i64 0, i32 5, !dbg !212
store i64 %arraylen11, i64 addrspace(10)* %.repack136, align 8, !dbg !212, !tbaa !75, !alias.scope !44, !noalias !213
store atomic {} addrspace(10)* %box, {} addrspace(10)* addrspace(11)* %43 release, align 8, !dbg !212, !tbaa !72, !alias.scope !44, !noalias !213
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* nofree noundef nonnull %newstruct13, {} addrspace(10)* nofree nonnull %box) #18, !dbg !212
%46 = bitcast {} addrspace(10)* %newstruct13 to i8 addrspace(10)*, !dbg !212
%47 = addrspacecast i8 addrspace(10)* %46 to i8 addrspace(11)*, !dbg !212
%48 = getelementptr inbounds i8, i8 addrspace(11)* %47, i64 8, !dbg !212
%49 = bitcast i8 addrspace(11)* %48 to {} addrspace(10)* addrspace(11)*, !dbg !212
store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803939037192 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %49 release, align 8, !dbg !212, !tbaa !72, !alias.scope !44, !noalias !213
%box16 = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803724611632 to {}*) to {} addrspace(10)*)) #17, !dbg !206
%50 = bitcast {} addrspace(10)* %box16 to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !206
%51 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %50, i64 0, i64 0, !dbg !206
store {} addrspace(10)* %newstruct13, {} addrspace(10)* addrspace(10)* %51, align 8, !dbg !206, !tbaa !75, !alias.scope !44, !noalias !213
%52 = addrspacecast {} addrspace(10)* %box16 to {} addrspace(12)*, !dbg !206
call void @ijl_throw({} addrspace(12)* %52) #19, !dbg !206
unreachable
L29: ; preds = %L17
%53 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc15"), !dbg !216
%54 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %28) #20, !dbg !216
%"'ipc14" = bitcast {}* %53 to i8**, !dbg !216
%55 = bitcast {}* %54 to i8**, !dbg !216
%"arrayptr20'ipl" = load i8*, i8** %"'ipc14", align 8, !dbg !216, !tbaa !28, !alias.scope !220, !noalias !184, !nonnull !8
%arrayptr20 = load i8*, i8** %55, align 8, !dbg !216, !tbaa !28, !alias.scope !202, !noalias !187, !nonnull !8
%"'ipc6" = ptrtoint i8* %"arrayptr20'ipl" to i64, !dbg !216
%56 = ptrtoint i8* %arrayptr20 to i64, !dbg !216
%"'ipc13" = addrspacecast {} addrspace(10)* %20 to {} addrspace(11)*, !dbg !221
%57 = addrspacecast {} addrspace(10)* %23 to {} addrspace(11)*, !dbg !221
%58 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc13"), !dbg !221
%59 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %57) #20, !dbg !221
%"'ipc12" = bitcast {}* %58 to i8**, !dbg !221
%60 = bitcast {}* %59 to i8**, !dbg !221
%"arrayptr22'ipl" = load i8*, i8** %"'ipc12", align 8, !dbg !221, !tbaa !28, !alias.scope !225, !noalias !226, !nonnull !8
%arrayptr22 = load i8*, i8** %60, align 8, !dbg !221, !tbaa !28, !alias.scope !207, !noalias !210, !nonnull !8
%"'ipc7" = ptrtoint i8* %"arrayptr22'ipl" to i64, !dbg !221
%61 = ptrtoint i8* %arrayptr22 to i64, !dbg !221
%62 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc11"), !dbg !227
%63 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %33) #20, !dbg !227
%"'ipc10" = bitcast {}* %62 to i8**, !dbg !227
%64 = bitcast {}* %63 to i8**, !dbg !227
%"arrayptr24'ipl" = load i8*, i8** %"'ipc10", align 8, !dbg !227, !tbaa !13, !alias.scope !230, !noalias !231, !nonnull !8
%arrayptr24 = load i8*, i8** %64, align 8, !dbg !227, !tbaa !13, !invariant.load !8, !alias.scope !195, !noalias !198, !nonnull !8
%"'ipc5" = ptrtoint i8* %"arrayptr24'ipl" to i64, !dbg !227
%65 = ptrtoint i8* %arrayptr24 to i64, !dbg !227
%.not = icmp eq i64 %arraysize, 0, !dbg !232
%66 = select i1 %.not, i64 1, i64 %arraysize, !dbg !236
%67 = call i64 @llvm.umax.i64(i64 %arraysize, i64 %66) #15, !dbg !236
%68 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %27, {} addrspace(10)* %24, {} addrspace(10)* %23, {} addrspace(10)* %20), !dbg !237
store i8 78, i8* %2, align 1, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
store i64 %arraysize, i64* %3, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
store i64 %arraysize3, i64* %5, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
%memcpy_refined_dst48 = bitcast i32* %7 to float*, !dbg !238
store float 1.000000e+00, float* %memcpy_refined_dst48, align 8, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
store i64 %67, i64* %9, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
store i64 1, i64* %11, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
%memcpy_refined_dst55 = bitcast i32* %13 to float*, !dbg !238
store float 0.000000e+00, float* %memcpy_refined_dst55, align 8, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
store i64 1, i64* %15, align 16, !dbg !238, !tbaa !72, !alias.scope !44, !noalias !213
%69 = bitcast i8* %4 to i64*, !dbg !237
%70 = load i64, i64* %69, align 8, !dbg !237
%71 = bitcast i8* %6 to i64*, !dbg !237
%72 = load i64, i64* %71, align 8, !dbg !237
%73 = mul i64 %70, %72, !dbg !237
%74 = mul nuw i64 %73, 4, !dbg !237
%75 = call noalias nonnull i8* @malloc(i64 %74), !dbg !237
%cache.A = bitcast i8* %75 to float*, !dbg !237
%76 = bitcast i8* %10 to i64*, !dbg !237
%77 = load i64, i64* %76, align 8, !dbg !237
%78 = inttoptr i64 %65 to float*, !dbg !237
call void @__enzyme_memcpy_float_mat_64(float* %cache.A, float* %78, i64 %70, i64 %72, i64 %77) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
%loaded.trans = load i8, i8* %2, align 1, !dbg !237
%79 = icmp eq i8 %loaded.trans, 78, !dbg !237
%80 = icmp eq i8 %loaded.trans, 110, !dbg !237
%81 = or i1 %80, %79, !dbg !237
%82 = select i1 %81, i8* %6, i8* %4, !dbg !237
%83 = bitcast i8* %82 to i64*, !dbg !237
%84 = load i64, i64* %83, align 8, !dbg !237
%85 = mul nuw i64 %84, 4, !dbg !237
%86 = call noalias nonnull i8* @malloc(i64 %85), !dbg !237
%cache.x = bitcast i8* %86 to float*, !dbg !237
store i64 1, i64* %byref., align 8, !dbg !237
%intcast. = bitcast i64* %byref. to i8*, !dbg !237
call void @scopy_64_(i8* %82, i64 %56, i8* %12, float* %cache.x, i8* %intcast.) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
call void @sgemv_64_(i8* noundef nonnull %2, i8* noundef nonnull %4, i8* noundef nonnull %6, i8* noundef nonnull %8, i64 %65, i8* noundef nonnull %10, i64 %56, i8* noundef nonnull %12, i8* noundef nonnull %14, i64 %61, i8* noundef nonnull %16, i64 noundef 1) #15 [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
call void @llvm.julia.gc_preserve_end(token %68) #15, !dbg !237
%arraylen98 = load i64, i64 addrspace(11)* %arraylen_ptr104, align 8, !dbg !241, !tbaa !58, !range !51, !alias.scope !207, !noalias !210
%inbounds.not = icmp eq i64 %arraylen98, 0, !dbg !241
br i1 %inbounds.not, label %oob, label %idxend, !dbg !241
oob: ; preds = %L29
%errorbox = alloca i64, align 8, !dbg !241
store i64 1, i64* %errorbox, align 8, !dbg !241, !noalias !243
%87 = addrspacecast {} addrspace(10)* %23 to {} addrspace(12)*, !dbg !241
call void @ijl_bounds_error_ints({} addrspace(12)* %87, i64* noundef nonnull align 8 %errorbox, i64 noundef 1) #19, !dbg !241
unreachable
idxend: ; preds = %L29
%"'ipc16" = bitcast {} addrspace(10)* %20 to float addrspace(13)* addrspace(10)*, !dbg !241
%"'ipc17" = addrspacecast float addrspace(13)* addrspace(10)* %"'ipc16" to float addrspace(13)* addrspace(11)*, !dbg !241
%"arrayptr100141'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc17", align 8, !dbg !241, !tbaa !28, !alias.scope !244, !noalias !226, !nonnull !8
br label %invertidxend, !dbg !242
inverttop: ; preds = %invertL17
store float 0.000000e+00, float addrspace(13)* %"arrayptr127'ipl", align 4, !dbg !179, !tbaa !41, !alias.scope !245, !noalias !246
fence syncscope("singlethread") seq_cst
fence syncscope("singlethread") seq_cst
ret void
invertL17: ; preds = %invertL29
br label %inverttop
invertL29: ; preds = %invertidxend
%88 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %27, {} addrspace(10)* %24, {} addrspace(10)* %23, {} addrspace(10)* %20), !dbg !237
%89 = ptrtoint float* %cache.A to i64, !dbg !237
%90 = ptrtoint float* %cache.x to i64, !dbg !237
store i64 1, i64* %byref.int.one, align 8, !dbg !237
%intcast.int.one = bitcast i64* %byref.int.one to i8*, !dbg !237
%ld.row.trans = load i8, i8* %2, align 1, !dbg !237
%91 = icmp eq i8 %ld.row.trans, 110, !dbg !237
%92 = icmp eq i8 %ld.row.trans, 78, !dbg !237
%93 = or i1 %92, %91, !dbg !237
%94 = select i1 %93, i64 %"'ipc7", i64 %90, !dbg !237
%95 = select i1 %93, i8* %16, i8* %intcast.int.one, !dbg !237
%96 = select i1 %93, i64 %90, i64 %"'ipc7", !dbg !237
%97 = select i1 %93, i8* %intcast.int.one, i8* %16, !dbg !237
call void @sger_64_(i8* %4, i8* %6, i8* %8, i64 %94, i8* %95, i64 %96, i8* %97, i64 %"'ipc5", i8* %10) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
%ld.transa = load i8, i8* %2, align 1, !dbg !237
%98 = icmp eq i8 %ld.transa, 110, !dbg !237
%99 = select i1 %98, i8 116, i8 0, !dbg !237
%100 = icmp eq i8 %ld.transa, 78, !dbg !237
%101 = select i1 %100, i8 84, i8 %99, !dbg !237
%102 = icmp eq i8 %ld.transa, 116, !dbg !237
%103 = select i1 %102, i8 110, i8 %101, !dbg !237
%104 = icmp eq i8 %ld.transa, 84, !dbg !237
%105 = select i1 %104, i8 78, i8 %103, !dbg !237
store i8 %105, i8* %byref.transpose.transa, align 1, !dbg !237
store i8 78, i8* %byref.constant.char.N, align 1, !dbg !237
%loaded.trans8 = load i8, i8* %byref.constant.char.N, align 1, !dbg !237
%106 = icmp eq i8 %loaded.trans8, 78, !dbg !237
%107 = icmp eq i8 %loaded.trans8, 110, !dbg !237
%108 = or i1 %107, %106, !dbg !237
%109 = select i1 %108, i8* %6, i8* %4, !dbg !237
store float 1.000000e+00, float* %byref.constant.fp.1.0, align 4, !dbg !237
%fpcast.constant.fp.1.0 = bitcast float* %byref.constant.fp.1.0 to i8*, !dbg !237
call void @sgemv_64_(i8* %byref.transpose.transa, i8* %4, i8* %6, i8* %8, i64 %89, i8* %109, i64 %"'ipc7", i8* %16, i8* %fpcast.constant.fp.1.0, i64 %"'ipc6", i8* %12, i64 1) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
%ld.row.trans9 = load i8, i8* %2, align 1, !dbg !237
%110 = icmp eq i8 %ld.row.trans9, 110, !dbg !237
%111 = icmp eq i8 %ld.row.trans9, 78, !dbg !237
%112 = or i1 %111, %110, !dbg !237
%113 = select i1 %112, i8* %4, i8* %6, !dbg !237
call void @sscal_64_(i8* %113, i8* %14, i64 %"'ipc7", i8* %16) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !237
%114 = bitcast float* %cache.A to i8*, !dbg !237
call void @free(i8* nonnull %114), !dbg !237
%115 = bitcast float* %cache.x to i8*, !dbg !237
call void @free(i8* nonnull %115), !dbg !237
call void @llvm.julia.gc_preserve_end(token %88), !dbg !237
br label %invertL17
invertidxend: ; preds = %idxend
store float %differeturn, float* %"arrayref'de", align 4
%116 = load float, float* %"arrayref'de", align 4, !dbg !241
store float 0.000000e+00, float* %"arrayref'de", align 4, !dbg !241
%117 = load float, float addrspace(13)* %"arrayptr100141'ipl", align 4, !dbg !241, !tbaa !41, !alias.scope !247, !noalias !250
%118 = fadd fast float %117, %116, !dbg !241
store float %118, float addrspace(13)* %"arrayptr100141'ipl", align 4, !dbg !241, !tbaa !41, !alias.scope !247, !noalias !250
br label %invertL29
}
** On entry to SGEMV parameter number 6 had an illegal value
((nothing,),)
@ZuseZ4 anything pop out?
using Enzyme; Enzyme.Compiler.bitcode_replacement!(false)
using LinearAlgebra
ps = zeros(Float32, 30, 1)
function adfunc(A)
Y = Vector{Float32}(undef, 30)
X = [1.0f0]
trans = 'N'
m,n = 30, 1
lda = 1
pX, sX = pointer(X), 1
pY, sY = pointer(Y), 1
pA = pointer(A)
lda = 30
alpha = 1.0f0
beta = 0.0f0
GC.@preserve A X Y ccall((:sgemv_64_, LinearAlgebra.BLAS.libblastrampoline), Cvoid,
(Ref{UInt8}, Ref{LinearAlgebra.BLAS.BlasInt}, Ref{LinearAlgebra.BLAS.BlasInt}, Ref{Float32},
Ptr{Float32}, Ref{LinearAlgebra.BLAS.BlasInt}, Ptr{Float32}, Ref{LinearAlgebra.BLAS.BlasInt},
Ref{Float32}, Ptr{Float32}, Ref{LinearAlgebra.BLAS.BlasInt}, Clong),
trans, 30, 1, alpha,
pA, lda, pX, sX,
beta, pY, sY, 1)
return @inbounds Y[1]
end
adfunc(ps)
Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(deepcopy(ps), deepcopy(ps)))
julia> Enzyme.autodiff(Enzyme.Reverse, adfunc, Enzyme.Duplicated(deepcopy(ps), deepcopy(ps)))
after simplification :
; Function Attrs: mustprogress willreturn
define float @preprocess_julia_adfunc_10183({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0) local_unnamed_addr #8 !dbg !78 {
top:
%1 = alloca i8, align 1
%2 = alloca i64, align 16
%3 = bitcast i64* %2 to i8*
%4 = alloca i64, align 16
%5 = bitcast i64* %4 to i8*
%6 = alloca i32, align 8
%7 = bitcast i32* %6 to i8*
%8 = alloca i64, align 16
%9 = bitcast i64* %8 to i8*
%10 = alloca i64, align 16
%11 = bitcast i64* %10 to i8*
%12 = alloca i32, align 8
%13 = bitcast i32* %12 to i8*
%14 = alloca i64, align 16
%15 = bitcast i64* %14 to i8*
%16 = call {}*** @julia.get_pgcstack() #9
%ptls_field71 = getelementptr inbounds {}**, {}*** %16, i64 2
%17 = bitcast {}*** %ptls_field71 to i64***
%ptls_load7273 = load i64**, i64*** %17, align 8, !tbaa !8
%18 = getelementptr inbounds i64*, i64** %ptls_load7273, i64 2
%safepoint = load i64*, i64** %18, align 8, !tbaa !12, !invariant.load !7
fence syncscope("singlethread") seq_cst
call void @julia.safepoint(i64* %safepoint) #9, !dbg !79
fence syncscope("singlethread") seq_cst
%19 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 30) #10, !dbg !80
%20 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 1) #10, !dbg !82
%21 = addrspacecast {} addrspace(10)* %20 to {} addrspace(11)*, !dbg !85
%22 = bitcast {} addrspace(10)* %20 to float addrspace(13)* addrspace(10)*, !dbg !85
%23 = addrspacecast float addrspace(13)* addrspace(10)* %22 to float addrspace(13)* addrspace(11)*, !dbg !85
%arrayptr74 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %23, align 8, !dbg !85, !tbaa !27, !alias.scope !87, !noalias !35, !nonnull !7
store float 1.000000e+00, float addrspace(13)* %arrayptr74, align 4, !dbg !85, !tbaa !40, !alias.scope !43, !noalias !90
%24 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %21) #11, !dbg !91
%25 = bitcast {}* %24 to i8**, !dbg !91
%arrayptr3 = load i8*, i8** %25, align 8, !dbg !91, !tbaa !27, !alias.scope !52, !noalias !35, !nonnull !7
%26 = addrspacecast {} addrspace(10)* %19 to {} addrspace(11)*, !dbg !94
%27 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %26) #11, !dbg !94
%28 = bitcast {}* %27 to i8**, !dbg !94
%arrayptr5 = load i8*, i8** %28, align 8, !dbg !94, !tbaa !27, !alias.scope !52, !noalias !35, !nonnull !7
%29 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !97
%30 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %29) #11, !dbg !97
%31 = bitcast {}* %30 to i8**, !dbg !97
%arrayptr7 = load i8*, i8** %31, align 8, !dbg !97, !tbaa !12, !invariant.load !7, !alias.scope !59, !noalias !60, !nonnull !7
%32 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nofree nonnull %0, {} addrspace(10)* nonnull %20, {} addrspace(10)* nonnull %19) #9, !dbg !100
store i8 78, i8* %1, align 1, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
store i64 30, i64* %2, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
store i64 1, i64* %4, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
%memcpy_refined_dst18 = bitcast i32* %6 to float*, !dbg !101
store float 1.000000e+00, float* %memcpy_refined_dst18, align 8, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
store i64 30, i64* %8, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
store i64 1, i64* %10, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
%memcpy_refined_dst27 = bitcast i32* %12 to float*, !dbg !101
store float 0.000000e+00, float* %memcpy_refined_dst27, align 8, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
store i64 1, i64* %14, align 16, !dbg !101, !tbaa !71, !alias.scope !43, !noalias !90
%33 = ptrtoint i8* %arrayptr7 to i64, !dbg !97
%34 = ptrtoint i8* %arrayptr5 to i64, !dbg !94
%35 = ptrtoint i8* %arrayptr3 to i64, !dbg !91
call void @sgemv_64_(i8* noundef nonnull %1, i8* noundef nonnull %3, i8* noundef nonnull %5, i8* noundef nonnull %7, i64 %33, i8* noundef nonnull %9, i64 %35, i8* noundef nonnull %11, i8* noundef nonnull %13, i64 %34, i8* noundef nonnull %15, i64 noundef 1) #9 [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !100
call void @llvm.julia.gc_preserve_end(token %32) #9, !dbg !100
%36 = bitcast {} addrspace(10)* %19 to float addrspace(13)* addrspace(10)*, !dbg !104
%37 = addrspacecast float addrspace(13)* addrspace(10)* %36 to float addrspace(13)* addrspace(11)*, !dbg !104
%arrayptr6875 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %37, align 8, !dbg !104, !tbaa !27, !alias.scope !87, !noalias !35, !nonnull !7
%arrayref = load float, float addrspace(13)* %arrayptr6875, align 4, !dbg !104, !tbaa !40, !alias.scope !43, !noalias !77
ret float %arrayref, !dbg !105
}
; Function Attrs: mustprogress willreturn
define internal void @diffejulia_adfunc_10183({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* align 16 %"'", float %differeturn) local_unnamed_addr #8 !dbg !106 {
top:
%"arrayref'de" = alloca float, align 4
%1 = getelementptr float, float* %"arrayref'de", i64 0
store float 0.000000e+00, float* %1, align 4
%byref. = alloca i64, align 8
%ret = alloca float, align 4
%byref.int.one = alloca i64, align 8
%byref.transpose.transa = alloca i8, align 1
%byref.constant.char.N = alloca i8, align 1
%byref.constant.fp.1.0 = alloca float, align 4
%2 = alloca i8, align 1
%3 = alloca i64, align 16
%4 = bitcast i64* %3 to i8*
%5 = alloca i64, align 16
%6 = bitcast i64* %5 to i8*
%7 = alloca i32, align 8
%8 = bitcast i32* %7 to i8*
%9 = alloca i64, align 16
%10 = bitcast i64* %9 to i8*
%11 = alloca i64, align 16
%12 = bitcast i64* %11 to i8*
%13 = alloca i32, align 8
%14 = bitcast i32* %13 to i8*
%15 = alloca i64, align 16
%16 = bitcast i64* %15 to i8*
%17 = call {}*** @julia.get_pgcstack() #11
%ptls_field71 = getelementptr inbounds {}**, {}*** %17, i64 2
%18 = bitcast {}*** %ptls_field71 to i64***
%ptls_load7273 = load i64**, i64*** %18, align 8, !tbaa !8, !alias.scope !107, !noalias !110
%19 = getelementptr inbounds i64*, i64** %ptls_load7273, i64 2
%safepoint = load i64*, i64** %19, align 8, !tbaa !12, !invariant.load !7, !alias.scope !112, !noalias !115
fence syncscope("singlethread") seq_cst
call void @julia.safepoint(i64* %safepoint) #11, !dbg !117
fence syncscope("singlethread") seq_cst
%20 = call {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 30), !dbg !118
%21 = bitcast {} addrspace(10)* %20 to i8 addrspace(13)* addrspace(10)*, !dbg !118
%22 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(10)* %21, align 8, !dbg !118
call void @llvm.memset.p13i8.i64(i8 addrspace(13)* align 4 %22, i8 0, i64 120, i1 false), !dbg !118
%23 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 30) #12, !dbg !118
%24 = call {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 1), !dbg !120
%25 = bitcast {} addrspace(10)* %24 to i8 addrspace(13)* addrspace(10)*, !dbg !120
%26 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(10)* %25, align 8, !dbg !120
call void @llvm.memset.p13i8.i64(i8 addrspace(13)* align 4 %26, i8 0, i64 4, i1 false), !dbg !120
%27 = call noalias nonnull {} addrspace(10)* @ijl_alloc_array_1d({} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 139803920500880 to {}*) to {} addrspace(10)*), i64 noundef 1) #12, !dbg !120
%"'ipc16" = addrspacecast {} addrspace(10)* %24 to {} addrspace(11)*, !dbg !123
%28 = addrspacecast {} addrspace(10)* %27 to {} addrspace(11)*, !dbg !123
%"'ipc17" = bitcast {} addrspace(10)* %24 to float addrspace(13)* addrspace(10)*, !dbg !123
%29 = bitcast {} addrspace(10)* %27 to float addrspace(13)* addrspace(10)*, !dbg !123
%"'ipc18" = addrspacecast float addrspace(13)* addrspace(10)* %"'ipc17" to float addrspace(13)* addrspace(11)*, !dbg !123
%30 = addrspacecast float addrspace(13)* addrspace(10)* %29 to float addrspace(13)* addrspace(11)*, !dbg !123
%"arrayptr74'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc18", align 8, !dbg !123, !tbaa !27, !alias.scope !125, !noalias !128, !nonnull !7
%arrayptr74 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %30, align 8, !dbg !123, !tbaa !27, !alias.scope !130, !noalias !131, !nonnull !7
store float 1.000000e+00, float addrspace(13)* %arrayptr74, align 4, !dbg !123, !tbaa !40, !alias.scope !132, !noalias !135
%31 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc16"), !dbg !137
%32 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %28) #13, !dbg !137
%"'ipc15" = bitcast {}* %31 to i8**, !dbg !137
%33 = bitcast {}* %32 to i8**, !dbg !137
%"arrayptr3'ipl" = load i8*, i8** %"'ipc15", align 8, !dbg !137, !tbaa !27, !alias.scope !140, !noalias !128, !nonnull !7
%arrayptr3 = load i8*, i8** %33, align 8, !dbg !137, !tbaa !27, !alias.scope !141, !noalias !131, !nonnull !7
%"'ipc14" = addrspacecast {} addrspace(10)* %20 to {} addrspace(11)*, !dbg !142
%34 = addrspacecast {} addrspace(10)* %23 to {} addrspace(11)*, !dbg !142
%35 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc14"), !dbg !142
%36 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* %34) #13, !dbg !142
%"'ipc13" = bitcast {}* %35 to i8**, !dbg !142
%37 = bitcast {}* %36 to i8**, !dbg !142
%"arrayptr5'ipl" = load i8*, i8** %"'ipc13", align 8, !dbg !142, !tbaa !27, !alias.scope !145, !noalias !148, !nonnull !7
%arrayptr5 = load i8*, i8** %37, align 8, !dbg !142, !tbaa !27, !alias.scope !150, !noalias !151, !nonnull !7
%"'ipc12" = addrspacecast {} addrspace(10)* %"'" to {} addrspace(11)*, !dbg !152
%38 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !152
%39 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc12"), !dbg !152
%40 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %38) #13, !dbg !152
%"'ipc11" = bitcast {}* %39 to i8**, !dbg !152
%41 = bitcast {}* %40 to i8**, !dbg !152
%"arrayptr7'ipl" = load i8*, i8** %"'ipc11", align 8, !dbg !152, !tbaa !12, !alias.scope !155, !noalias !158, !nonnull !7
%arrayptr7 = load i8*, i8** %41, align 8, !dbg !152, !tbaa !12, !invariant.load !7, !alias.scope !160, !noalias !161, !nonnull !7
%42 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %27, {} addrspace(10)* %24, {} addrspace(10)* %23, {} addrspace(10)* %20), !dbg !162
store i8 78, i8* %2, align 1, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
store i64 30, i64* %3, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
store i64 1, i64* %5, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
%memcpy_refined_dst18 = bitcast i32* %7 to float*, !dbg !163
store float 1.000000e+00, float* %memcpy_refined_dst18, align 8, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
store i64 30, i64* %9, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
store i64 1, i64* %11, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
%memcpy_refined_dst27 = bitcast i32* %13 to float*, !dbg !163
store float 0.000000e+00, float* %memcpy_refined_dst27, align 8, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
store i64 1, i64* %15, align 16, !dbg !163, !tbaa !71, !alias.scope !43, !noalias !166
%"'ipc6" = ptrtoint i8* %"arrayptr7'ipl" to i64, !dbg !152
%43 = ptrtoint i8* %arrayptr7 to i64, !dbg !152
%"'ipc8" = ptrtoint i8* %"arrayptr5'ipl" to i64, !dbg !142
%44 = ptrtoint i8* %arrayptr5 to i64, !dbg !142
%"'ipc7" = ptrtoint i8* %"arrayptr3'ipl" to i64, !dbg !137
%45 = ptrtoint i8* %arrayptr3 to i64, !dbg !137
%46 = bitcast i8* %4 to i64*, !dbg !162
%47 = load i64, i64* %46, align 8, !dbg !162
%48 = bitcast i8* %6 to i64*, !dbg !162
%49 = load i64, i64* %48, align 8, !dbg !162
%50 = mul i64 %47, %49, !dbg !162
%51 = mul nuw i64 %50, 4, !dbg !162
%52 = call noalias nonnull i8* @malloc(i64 %51), !dbg !162
%cache.A = bitcast i8* %52 to float*, !dbg !162
%53 = bitcast i8* %10 to i64*, !dbg !162
%54 = load i64, i64* %53, align 8, !dbg !162
%55 = inttoptr i64 %43 to float*, !dbg !162
call void @__enzyme_memcpy_float_mat_64(float* %cache.A, float* %55, i64 %47, i64 %49, i64 %54) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
%loaded.trans = load i8, i8* %2, align 1, !dbg !162
%56 = icmp eq i8 %loaded.trans, 78, !dbg !162
%57 = icmp eq i8 %loaded.trans, 110, !dbg !162
%58 = or i1 %57, %56, !dbg !162
%59 = select i1 %58, i8* %6, i8* %4, !dbg !162
%60 = bitcast i8* %59 to i64*, !dbg !162
%61 = load i64, i64* %60, align 8, !dbg !162
%62 = mul nuw i64 %61, 4, !dbg !162
%63 = call noalias nonnull i8* @malloc(i64 %62), !dbg !162
%cache.x = bitcast i8* %63 to float*, !dbg !162
store i64 1, i64* %byref., align 8, !dbg !162
%intcast. = bitcast i64* %byref. to i8*, !dbg !162
call void @scopy_64_(i8* %59, i64 %45, i8* %12, float* %cache.x, i8* %intcast.) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
call void @sgemv_64_(i8* noundef nonnull %2, i8* noundef nonnull %4, i8* noundef nonnull %6, i8* noundef nonnull %8, i64 %43, i8* noundef nonnull %10, i64 %45, i8* noundef nonnull %12, i8* noundef nonnull %14, i64 %44, i8* noundef nonnull %16, i64 noundef 1) #11 [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
call void @llvm.julia.gc_preserve_end(token %42) #11, !dbg !162
%"'ipc" = bitcast {} addrspace(10)* %20 to float addrspace(13)* addrspace(10)*, !dbg !169
%"'ipc4" = addrspacecast float addrspace(13)* addrspace(10)* %"'ipc" to float addrspace(13)* addrspace(11)*, !dbg !169
%"arrayptr6875'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc4", align 8, !dbg !169, !tbaa !27, !alias.scope !171, !noalias !148, !nonnull !7
br label %inverttop, !dbg !170
inverttop: ; preds = %top
store float %differeturn, float* %"arrayref'de", align 4
%64 = load float, float* %"arrayref'de", align 4, !dbg !169
store float 0.000000e+00, float* %"arrayref'de", align 4, !dbg !169
%65 = load float, float addrspace(13)* %"arrayptr6875'ipl", align 4, !dbg !169, !tbaa !40, !alias.scope !172, !noalias !175
%66 = fadd fast float %65, %64, !dbg !169
store float %66, float addrspace(13)* %"arrayptr6875'ipl", align 4, !dbg !169, !tbaa !40, !alias.scope !172, !noalias !175
%67 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %27, {} addrspace(10)* %24, {} addrspace(10)* %23, {} addrspace(10)* %20), !dbg !162
%68 = ptrtoint float* %cache.A to i64, !dbg !162
%69 = ptrtoint float* %cache.x to i64, !dbg !162
store i64 1, i64* %byref.int.one, align 8, !dbg !162
%intcast.int.one = bitcast i64* %byref.int.one to i8*, !dbg !162
%ld.row.trans = load i8, i8* %2, align 1, !dbg !162
%70 = icmp eq i8 %ld.row.trans, 110, !dbg !162
%71 = icmp eq i8 %ld.row.trans, 78, !dbg !162
%72 = or i1 %71, %70, !dbg !162
%73 = select i1 %72, i64 %"'ipc8", i64 %69, !dbg !162
%74 = select i1 %72, i8* %16, i8* %intcast.int.one, !dbg !162
%75 = select i1 %72, i64 %69, i64 %"'ipc8", !dbg !162
%76 = select i1 %72, i8* %intcast.int.one, i8* %16, !dbg !162
call void @sger_64_(i8* %4, i8* %6, i8* %8, i64 %73, i8* %74, i64 %75, i8* %76, i64 %"'ipc6", i8* %10) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
%ld.transa = load i8, i8* %2, align 1, !dbg !162
%77 = icmp eq i8 %ld.transa, 110, !dbg !162
%78 = select i1 %77, i8 116, i8 0, !dbg !162
%79 = icmp eq i8 %ld.transa, 78, !dbg !162
%80 = select i1 %79, i8 84, i8 %78, !dbg !162
%81 = icmp eq i8 %ld.transa, 116, !dbg !162
%82 = select i1 %81, i8 110, i8 %80, !dbg !162
%83 = icmp eq i8 %ld.transa, 84, !dbg !162
%84 = select i1 %83, i8 78, i8 %82, !dbg !162
store i8 %84, i8* %byref.transpose.transa, align 1, !dbg !162
store i8 78, i8* %byref.constant.char.N, align 1, !dbg !162
%loaded.trans9 = load i8, i8* %byref.constant.char.N, align 1, !dbg !162
%85 = icmp eq i8 %loaded.trans9, 78, !dbg !162
%86 = icmp eq i8 %loaded.trans9, 110, !dbg !162
%87 = or i1 %86, %85, !dbg !162
%88 = select i1 %87, i8* %6, i8* %4, !dbg !162
store float 1.000000e+00, float* %byref.constant.fp.1.0, align 4, !dbg !162
%fpcast.constant.fp.1.0 = bitcast float* %byref.constant.fp.1.0 to i8*, !dbg !162
call void @sgemv_64_(i8* %byref.transpose.transa, i8* %4, i8* %6, i8* %8, i64 %68, i8* %88, i64 %"'ipc8", i8* %16, i8* %fpcast.constant.fp.1.0, i64 %"'ipc7", i8* %12, i64 1) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
%ld.row.trans10 = load i8, i8* %2, align 1, !dbg !162
%89 = icmp eq i8 %ld.row.trans10, 110, !dbg !162
%90 = icmp eq i8 %ld.row.trans10, 78, !dbg !162
%91 = or i1 %90, %89, !dbg !162
%92 = select i1 %91, i8* %4, i8* %6, !dbg !162
call void @sscal_64_(i8* %92, i8* %14, i64 %"'ipc8", i8* %16) [ "jl_roots"({} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null, {} addrspace(10)* null) ], !dbg !162
%93 = bitcast float* %cache.A to i8*, !dbg !162
call void @free(i8* nonnull %93), !dbg !162
%94 = bitcast float* %cache.x to i8*, !dbg !162
call void @free(i8* nonnull %94), !dbg !162
call void @llvm.julia.gc_preserve_end(token %67), !dbg !162
store float 0.000000e+00, float addrspace(13)* %"arrayptr74'ipl", align 4, !dbg !123, !tbaa !40, !alias.scope !177, !noalias !178
fence syncscope("singlethread") seq_cst
fence syncscope("singlethread") seq_cst
ret void
}
** On entry to SGEMV parameter number 6 had an illegal value
((nothing,),)
Should be fixed by https://github.com/EnzymeAD/Enzyme.jl/pull/1281
please reopen if not.
It now produces the correct result with bitcode replacement on and off. However, I am a bit surprised that the allocations are exactly the same in both versions: on:
julia> @btime Zygote.gradient(loss_adjoint,θ)
┌ Warning: Using fallback BLAS replacements for (["ssymv_64_"]), performance may be degraded
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/U36Ed/src/utils.jl:59
1.089 s (5063232 allocations: 674.09 MiB)
(Float32[-48.12045, 96.89185, 5.4492106, -136.30328, -277.6249, -2.9152653, 159.34677, -252.21376, -168.57451, 95.22521 … 28.876875, 58.53126, -94.83481, 123.85488, 202.57362, 72.3266, -231.3183, -164.42274, -63.517776, -324.779],)
off:
julia> @btime Zygote.gradient(loss_adjoint,θ)
1.113 s (5063232 allocations: 674.09 MiB)
(Float32[-48.12045, 96.89185, 5.4492106, -136.30328, -277.6249, -2.9152653, 159.34677, -252.21376, -168.57451, 95.22521 … 28.876875, 58.53126, -94.83481, 123.85488, 202.57362, 72.3266, -231.3183, -164.42274, -63.517776, -324.779],)
If different BLAS code is used, would you not expect at least some difference in allocations?
As far as I know Julia tooling measures allocations on a higher level, so such low-level allocations won't be caught when using the rules. I assume the same holds for the fallback.
As of the latest release, for reverse mode, using bitcode replacement has no effect for dot/gemm/gemv/etc, all of these will use tablegen rather than fallback blas
Prints warning:
** On entry to SGEMV parameter number 6 had an illegal value **
Adapted from https://docs.sciml.ai/SciMLSensitivity/stable/examples/optimal_control/optimal_control/